English
Related papers

Related papers: The Alignment Problem in Context

200 papers

The progress of AI systems such as large language models (LLMs) raises increasingly pressing concerns about their safe deployment. This paper examines the value alignment problem for LLMs, arguing that current alignment strategies are…

Computation and Language · Computer Science 2025-06-06 Raphaël Millière

Recent years have witnessed remarkable progress made in large language models (LLMs). Such advancements, while garnering significant attention, have concurrently elicited various concerns. The potential of these models is undeniably vast;…

Computation and Language · Computer Science 2023-09-27 Tianhao Shen , Renren Jin , Yufei Huang , Chuang Liu , Weilong Dong , Zishan Guo , Xinwei Wu , Yan Liu , Deyi Xiong

Due to the remarkable capabilities and growing impact of large language models (LLMs), they have been deeply integrated into many aspects of society. Thus, ensuring their alignment with human values and intentions has emerged as a critical…

The development of sophisticated artificial intelligence (AI) conversational agents based on large language models raises important questions about the relationship between human norms, values, and practices and AI design and performance.…

Computers and Society · Computer Science 2025-05-30 Rachel Katharine Sterken , James Ravi Kirkpatrick

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…

Artificial Intelligence · Computer Science 2024-11-26 Robert West , Roland Aydin

As AI systems advance in capabilities, measuring their safety and alignment to human values is becoming paramount. A fast-growing field of AI research is devoted to developing such assessments. However, most current advances therein may be…

Computers and Society · Computer Science 2026-03-17 Max Hellrigel-Holderbaum , Edward James Young

The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants. We argue that a…

Machine Learning · Computer Science 2020-04-10 Shai Shalev-Shwartz , Shaked Shammah , Amnon Shashua

As Large Language Models (LLMs) become more powerful and autonomous, they increasingly face conflicts and dilemmas in many scenarios. We first summarize and taxonomize these diverse conflicts. Then, we model the LLM's preferences to make…

Artificial Intelligence · Computer Science 2026-03-17 Zhenheng Tang , Xiang Liu , Qian Wang , Eunsol Choi , Bo Li , Xiaowen Chu

This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs). Superalignment is a theoretical framework that aspires to ensure that superintelligent AI…

Computers and Society · Computer Science 2024-03-25 Gokul Puthumanaillam , Manav Vora , Pranay Thangeda , Melkior Ornik

Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) inevitably raises concerns about the risks and safety…

Computers and Society · Computer Science 2024-07-29 Jaymari Chua , Yun Li , Shiyi Yang , Chen Wang , Lina Yao

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete.…

The field of artificial intelligence (AI) alignment aims to investigate whether AI technologies align with human interests and values and function in a safe and ethical manner. AI alignment is particularly relevant for large language models…

Human-Computer Interaction · Computer Science 2023-01-18 Thilo Hagendorff , Sarah Fabi

Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally…

Big models have greatly advanced AI's ability to understand, generate, and manipulate information and content, enabling numerous applications. However, as these models become increasingly integrated into everyday life, their inherent…

Computers and Society · Computer Science 2023-10-27 Xiaoyuan Yi , Jing Yao , Xiting Wang , Xing Xie

As AI adoption expands across human society, the problem of aligning AI models to match human preferences remains a grand challenge. Currently, the AI alignment field is deeply divided between behavioral and representational approaches,…

Computers and Society · Computer Science 2025-08-12 Ben Y. Reis , William La Cava

The issues of AI risk and AI safety are becoming critical as the prospect of artificial general intelligence (AGI) looms larger. The emergence of extremely large and capable generative models has led to alarming predictions and created a…

Artificial Intelligence · Computer Science 2025-05-20 Ali A. Minai

This paper addresses the question of how to align AI systems with human values and situates it within a wider body of thought regarding technology and value. Far from existing in a vacuum, there has long been an interest in the ability of…

Computers and Society · Computer Science 2021-01-19 Iason Gabriel , Vafa Ghazavi

Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users. The problem has been widely argued to be one of the central safety problems in AI.…

Artificial Intelligence · Computer Science 2023-02-10 Malek Mechergui , Sarath Sreedharan

A key concern with the concept of "alignment" is the implicit question of "alignment to what?". AI systems are increasingly used across the world, yet safety alignment is often focused on homogeneous monolingual settings. Additionally,…

Computation and Language · Computer Science 2024-07-09 Aakanksha , Arash Ahmadian , Beyza Ermis , Seraphina Goldfarb-Tarrant , Julia Kreutzer , Marzieh Fadaee , Sara Hooker

The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges. This becomes particularly evident when LLMs inadvertently generate harmful or toxic content,…

Computation and Language · Computer Science 2024-02-20 Kai Chen , Chunwei Wang , Kuo Yang , Jianhua Han , Lanqing Hong , Fei Mi , Hang Xu , Zhengying Liu , Wenyong Huang , Zhenguo Li , Dit-Yan Yeung , Lifeng Shang , Xin Jiang , Qun Liu
‹ Prev 1 2 3 10 Next ›