Related papers: The Alignment Problem in Context

Normative Conflicts and Shallow AI Alignment

The progress of AI systems such as large language models (LLMs) raises increasingly pressing concerns about their safe deployment. This paper examines the value alignment problem for LLMs, arguing that current alignment strategies are…

Computation and Language · Computer Science 2025-06-06 Raphaël Millière

Large Language Model Alignment: A Survey

Recent years have witnessed remarkable progress made in large language models (LLMs). Such advancements, while garnering significant attention, have concurrently elicited various concerns. The potential of these models is undeniably vast;…

Computation and Language · Computer Science 2023-09-27 Tianhao Shen , Renren Jin , Yufei Huang , Chuang Liu , Weilong Dong , Zishan Guo , Xinwei Wu , Yan Liu , Deyi Xiong

Alignment and Safety in Large Language Models: Safety Mechanisms, Training Paradigms, and Emerging Challenges

Due to the remarkable capabilities and growing impact of large language models (LLMs), they have been deeply integrated into many aspects of society. Thus, ensuring their alignment with human values and intentions has emerged as a critical…

Artificial Intelligence · Computer Science 2025-07-29 Haoran Lu , Luyang Fang , Ruidong Zhang , Xinliang Li , Jiazhang Cai , Huimin Cheng , Lin Tang , Ziyu Liu , Zeliang Sun , Tao Wang , Yingchuan Zhang , Arif Hassan Zidan , Jinwen Xu , Jincheng Yu , Meizhi Yu , Hanqi Jiang , Xilin Gong , Weidi Luo , Bolun Sun , Yongkai Chen , Terry Ma , Shushan Wu , Yifan Zhou , Junhao Chen , Haotian Xiang , Jing Zhang , Afrar Jahin , Wei Ruan , Ke Deng , Yi Pan , Peilong Wang , Jiahui Li , Zhengliang Liu , Lu Zhang , Lin Zhao , Wei Liu , Dajiang Zhu , Xin Xing , Fei Dou , Wei Zhang , Chao Huang , Rongjie Liu , Mengrui Zhang , Yiwen Liu , Xiaoxiao Sun , Qin Lu , Zhen Xiang , Wenxuan Zhong , Tianming Liu , Ping Ma

Conversational Alignment with Artificial Intelligence in Context

The development of sophisticated artificial intelligence (AI) conversational agents based on large language models raises important questions about the relationship between human norms, values, and practices and AI design and performance.…

Computers and Society · Computer Science 2025-05-30 Rachel Katharine Sterken , James Ravi Kirkpatrick

The AI Alignment Paradox

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…

Artificial Intelligence · Computer Science 2024-11-26 Robert West , Roland Aydin

Questionnaire Responses Do not Capture the Safety of AI Agents

As AI systems advance in capabilities, measuring their safety and alignment to human values is becoming paramount. A fast-growing field of AI research is devoted to developing such assessments. However, most current advances therein may be…

Computers and Society · Computer Science 2026-03-17 Max Hellrigel-Holderbaum , Edward James Young

On the Ethics of Building AI in a Responsible Manner

The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants. We argue that a…

Machine Learning · Computer Science 2020-04-10 Shai Shalev-Shwartz , Shaked Shammah , Amnon Shashua

Are Dilemmas and Conflicts in LLM Alignment Solvable? A View from Priority Graph

As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first summarize and taxonomize these diverse conflicts. Then, we model the LLM's preferences to make…

Artificial Intelligence · Computer Science 2026-03-17 Zhenheng Tang , Xiang Liu , Qian Wang , Eunsol Choi , Bo Li , Xiaowen Chu

A Moral Imperative: The Need for Continual Superalignment of Large Language Models

This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs). Superalignment is a theoretical framework that aspires to ensure that superintelligent AI…

Computers and Society · Computer Science 2024-03-25 Gokul Puthumanaillam , Manav Vora , Pranay Thangeda , Melkior Ornik

AI Safety in Generative AI Large Language Models: A Survey

Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) inevitably raises concerns about the risks and safety…

Computers and Society · Computer Science 2024-07-29 Jaymari Chua , Yun Li , Shiyi Yang , Chen Wang , Lina Yao

Positive Alignment: Artificial Intelligence for Human Flourishing

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete.…

Artificial Intelligence · Computer Science 2026-05-15 Ruben Laukkonen , Seb Krier , Chloé Bakalar , Shamil Chandaria , Morten Kringelbach , Adam Elwood , Daniel Ford , Fernando Rosas , Maty Bohacek , Matija Franklin , Nenad Tomašev , Stephanie Chan , Verena Rieser , Roma Patel , Michael Levin , Arun Rao

Methodological reflections for AI alignment research using human feedback

The field of artificial intelligence (AI) alignment aims to investigate whether AI technologies align with human interests and values and function in a safe and ethical manner. AI alignment is particularly relevant for large language models…

Human-Computer Interaction · Computer Science 2023-01-18 Thilo Hagendorff , Sarah Fabi

Legal Alignment for Safe and Ethical AI

Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally…

Computers and Society · Computer Science 2026-01-08 Noam Kolt , Nicholas Caputo , Jack Boeglin , Cullen O'Keefe , Rishi Bommasani , Stephen Casper , Mariano-Florentino Cuéllar , Noah Feldman , Iason Gabriel , Gillian K. Hadfield , Lewis Hammond , Peter Henderson , Atoosa Kasirzadeh , Seth Lazar , Anka Reuel , Kevin L. Wei , Jonathan Zittrain

Unpacking the Ethical Value Alignment in Big Models

Big models have greatly advanced AI's ability to understand, generate, and manipulate information and content, enabling numerous applications. However, as these models become increasingly integrated into everyday life, their inherent…

Computers and Society · Computer Science 2023-10-27 Xiaoyuan Yi , Jing Yao , Xiting Wang , Xing Xie

Towards Integrated Alignment

As AI adoption expands across human society, the problem of aligning AI models to match human preferences remains a grand challenge. Currently, the AI alignment field is deeply divided between behavioral and representational approaches,…

Computers and Society · Computer Science 2025-08-12 Ben Y. Reis , William La Cava

Position Paper: Bounded Alignment: What (Not) To Expect From AGI Agents

The issues of AI risk and AI safety are becoming critical as the prospect of artificial general intelligence (AGI) looms larger. The emergence of extremely large and capable generative models has led to alarming predictions and created a…

Artificial Intelligence · Computer Science 2025-05-20 Ali A. Minai

The Challenge of Value Alignment: from Fairer Algorithms to AI Safety

This paper addresses the question of how to align AI systems with human values and situates it within a wider body of thought regarding technology and value. Far from existing in a vacuum, there has long been an interest in the ability of…

Computers and Society · Computer Science 2021-01-19 Iason Gabriel , Vafa Ghazavi

Goal Alignment: A Human-Aware Account of Value Alignment Problem

Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users. The problem has been widely argued to be one of the central safety problems in AI.…

Artificial Intelligence · Computer Science 2023-02-10 Malek Mechergui , Sarath Sreedharan

The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm

A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally,…

Computation and Language · Computer Science 2024-07-09 Aakanksha , Arash Ahmadian , Beyza Ermis , Seraphina Goldfarb-Tarrant , Julia Kreutzer , Marzieh Fadaee , Sara Hooker

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges. This becomes particularly evident when LLMs inadvertently generate harmful or toxic content,…

Computation and Language · Computer Science 2024-02-20 Kai Chen , Chunwei Wang , Kuo Yang , Jianhua Han , Lanqing Hong , Fei Mi , Hang Xu , Zhengying Liu , Wenyong Huang , Zhenguo Li , Dit-Yan Yeung , Lifeng Shang , Xin Jiang , Qun Liu