Related papers: Superalignment with Dynamic Human Values

The Superalignment of Superhuman Intelligence with Large Language Models

We have witnessed superhuman intelligence thanks to the fast development of large language models and multimodal language models. As the application of such superhuman models becomes more and more popular, a critical question arises here:…

Computation and Language · Computer Science 2024-12-24 Minlie Huang , Yingkang Wang , Shiyao Cui , Pei Ke , Jie Tang

Research Superalignment Should Advance Now with Alternating Competence and Conformity Optimization

The recent leap in AI capabilities, driven by big generative models, has sparked the possibility of achieving Artificial General Intelligence (AGI) and further triggered discussions on Artificial Superintelligence (ASI)-a system surpassing…

Artificial Intelligence · Computer Science 2026-02-10 HyunJin Kim , Xiaoyuan Yi , Jing Yao , Muhua Huang , JinYeong Bak , James Evans , Xing Xie

Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision

Current AI alignment methodologies rely on human-provided demonstrations or judgments, and the learned capabilities of AI systems would be upper-bounded by human capabilities as a result. This raises a challenging research question: How can…

Machine Learning · Computer Science 2024-12-11 Zhiqing Sun , Longhui Yu , Yikang Shen , Weiyang Liu , Yiming Yang , Sean Welleck , Chuang Gan

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

The emergence of large language models (LLMs) has sparked the possibility of about Artificial Superintelligence (ASI), a hypothetical AI system surpassing human intelligence. However, existing alignment paradigms struggle to guide such…

Machine Learning · Computer Science 2024-12-30 HyunJin Kim , Xiaoyuan Yi , Jing Yao , Jianxun Lian , Muhua Huang , Shitong Duan , JinYeong Bak , Xing Xie

A Moral Imperative: The Need for Continual Superalignment of Large Language Models

This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs). Superalignment is a theoretical framework that aspires to ensure that superintelligent AI…

Computers and Society · Computer Science 2024-03-25 Gokul Puthumanaillam , Manav Vora , Pranay Thangeda , Melkior Ornik

Scalable agent alignment via reward modeling: a research direction

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task…

Machine Learning · Computer Science 2018-11-20 Jan Leike , David Krueger , Tom Everitt , Miljan Martic , Vishal Maini , Shane Legg

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Superalignment, where humans act as weak supervisors for superhuman models, has become a crucial problem with the rapid development of Large Language Models (LLMs). Recent work has preliminarily studied this problem by using weak models to…

Computation and Language · Computer Science 2025-03-03 Wenkai Yang , Shiqi Shen , Guangyao Shen , Wei Yao , Yong Liu , Zhi Gong , Yankai Lin , Ji-Rong Wen

Super Co-alignment of Human and AI for Sustainable Symbiotic Society

As Artificial Intelligence (AI) advances toward Artificial General Intelligence (AGI) and eventually Artificial Superintelligence (ASI), it may potentially surpass human control, deviate from human values, and even lead to irreversible…

Artificial Intelligence · Computer Science 2025-07-01 Yi Zeng , Feifei Zhao , Yuwei Wang , Enmeng Lu , Yaodong Yang , Lei Wang , Chao Liu , Yitao Liang , Dongcheng Zhao , Bing Han , Haibo Tong , Yao Liang , Dongqi Liang , Kang Sun , Boyuan Chen , Jinyu Fan

Goal Alignment: A Human-Aware Account of Value Alignment Problem

Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users. The problem has been widely argued to be one of the central safety problems in AI.…

Artificial Intelligence · Computer Science 2023-02-10 Malek Mechergui , Sarath Sreedharan

Rethinking How AI Embeds and Adapts to Human Values: Challenges and Opportunities

The concepts of ``human-centered AI'' and ``value-based decision'' have gained significant attention in both research and industry. However, many critical aspects remain underexplored and require further investigation. In particular, there…

Artificial Intelligence · Computer Science 2025-08-26 Sz-Ting Tzeng , Frank Dignum

How to Mitigate Overfitting in Weak-to-strong Generalization?

Aligning powerful AI models on tasks that surpass human evaluation capabilities is the central problem of \textbf{superalignment}. To address this problem, weak-to-strong generalization aims to elicit the capabilities of strong models…

Machine Learning · Computer Science 2025-03-07 Junhao Shi , Qinyuan Cheng , Zhaoye Fei , Yining Zheng , Qipeng Guo , Xipeng Qiu

Tuning computer vision models with task rewards

Misalignment between model predictions and intended usage can be detrimental for the deployment of computer vision models. The issue is exacerbated when the task involves complex structured outputs, as it becomes harder to design procedures…

Computer Vision and Pattern Recognition · Computer Science 2023-02-17 André Susano Pinto , Alexander Kolesnikov , Yuge Shi , Lucas Beyer , Xiaohua Zhai

What are you optimizing for? Aligning Recommender Systems with Human Values

We describe cases where real recommender systems were modified in the service of various human values such as diversity, fairness, well-being, time well spent, and factual accuracy. From this we identify the current practice of values…

Information Retrieval · Computer Science 2021-07-26 Jonathan Stray , Ivan Vendrov , Jeremy Nixon , Steven Adler , Dylan Hadfield-Menell

Self-Supervised Learning Across Domains

Human adaptability relies crucially on learning and merging knowledge from both supervised and unsupervised tasks: the parents point out few important concepts, but then the children fill in the gaps on their own. This is particularly…

Computer Vision and Pattern Recognition · Computer Science 2021-04-01 Silvia Bucci , Antonio D'Innocente , Yujun Liao , Fabio Maria Carlucci , Barbara Caputo , Tatiana Tommasi

On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models

Big models have achieved revolutionary breakthroughs in the field of AI, but they might also pose potential concerns. Addressing such concerns, alignment technologies were introduced to make these models conform to human preferences and…

Artificial Intelligence · Computer Science 2024-03-08 Xinpeng Wang , Shitong Duan , Xiaoyuan Yi , Jing Yao , Shanlin Zhou , Zhihua Wei , Peng Zhang , Dongkuan Xu , Maosong Sun , Xing Xie

Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Recent advancements in large language models have sparked interest in their extraordinary and near-superhuman capabilities, leading researchers to explore methods for evaluating and optimizing these abilities, which is called…

Computer Vision and Pattern Recognition · Computer Science 2024-02-07 Jianyuan Guo , Hanting Chen , Chengcheng Wang , Kai Han , Chang Xu , Yunhe Wang

Dynamic value alignment through preference aggregation of multiple objectives

The development of ethical AI systems is currently geared toward setting objective functions that align with human objectives. However, finding such functions remains a research challenge, while in RL, setting rewards by hand is a fairly…

Artificial Intelligence · Computer Science 2023-10-10 Marcin Korecki , Damian Dailisan , Cesare Carissimo

Understanding the Process of Human-AI Value Alignment

Background: Value alignment in computer science research is often used to refer to the process of aligning artificial intelligence with humans, but the way the phrase is used often lacks precision. Objectives: In this paper, we conduct a…

Computers and Society · Computer Science 2026-03-27 Jack McKinlay , Marina De Vos , Janina A. Hoffmann , Andreas Theodorou

A Generalizable Approach to Learning Optimizers

A core issue with learning to optimize neural networks has been the lack of generalization to real world problems. To address this, we describe a system designed from a generalization-first perspective, learning to update optimizer…

Machine Learning · Computer Science 2021-06-09 Diogo Almeida , Clemens Winter , Jie Tang , Wojciech Zaremba

Aligning Machine and Human Visual Representations across Abstraction Levels

Deep neural networks have achieved success across a wide range of applications, including as models of human behavior and neural representations in vision tasks. However, neural network training and human learning differ in fundamental…

Computer Vision and Pattern Recognition · Computer Science 2025-09-04 Lukas Muttenthaler , Klaus Greff , Frieda Born , Bernhard Spitzer , Simon Kornblith , Michael C. Mozer , Klaus-Robert Müller , Thomas Unterthiner , Andrew K. Lampinen