Related papers: A Multi-Level Framework for the AI Alignment Probl…

Artificial Intelligence, Values and Alignment

This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive…

Computers and Society · Computer Science 2020-10-07 Iason Gabriel

Foundational Moral Values for AI Alignment

Solving the AI alignment problem requires having clear, defensible values towards which AI systems can align. Currently, targets for alignment remain underspecified and do not seem to be built from a philosophically robust structure. We…

Computers and Society · Computer Science 2023-11-29 Betty Li Hou , Brian Patrick Green

The Challenge of Value Alignment: from Fairer Algorithms to AI Safety

This paper addresses the question of how to align AI systems with human values and situates it within a wider body of thought regarding technology and value. Far from existing in a vacuum, there has long been an interest in the ability of…

Computers and Society · Computer Science 2021-01-19 Iason Gabriel , Vafa Ghazavi

Rethinking How AI Embeds and Adapts to Human Values: Challenges and Opportunities

The concepts of ``human-centered AI'' and ``value-based decision'' have gained significant attention in both research and industry. However, many critical aspects remain underexplored and require further investigation. In particular, there…

Artificial Intelligence · Computer Science 2025-08-26 Sz-Ting Tzeng , Frank Dignum

Concept Alignment

Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is…

Machine Learning · Computer Science 2024-01-18 Sunayana Rane , Polyphony J. Bruna , Ilia Sucholutsky , Christopher Kello , Thomas L. Griffiths

The AI Alignment Paradox

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…

Artificial Intelligence · Computer Science 2024-11-26 Robert West , Roland Aydin

Smart But Not Moral? Moral Alignment In Human-AI Decision-Making

In high-stakes AI-supported decisions, considerations are not purely technical but involve moral judgments about fairness, responsibility, and harm. While prior research has focused mainly on functional or behavioral alignment, this paper…

Human-Computer Interaction · Computer Science 2026-04-17 Christiane Ernst , Luis Gutmann , Domenique Zipperling , Kathrin Figl , Niklas Kühl

Value alignment: a formal approach

principles that should govern autonomous AI systems. It essentially states that a system's goals and behaviour should be aligned with human values. But how to ensure value alignment? In this paper we first provide a formal model to…

Artificial Intelligence · Computer Science 2024-02-08 Carles Sierra , Nardine Osman , Pablo Noriega , Jordi Sabater-Mir , Antoni Perelló

Legal Alignment for Safe and Ethical AI

Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally…

Computers and Society · Computer Science 2026-01-08 Noam Kolt , Nicholas Caputo , Jack Boeglin , Cullen O'Keefe , Rishi Bommasani , Stephen Casper , Mariano-Florentino Cuéllar , Noah Feldman , Iason Gabriel , Gillian K. Hadfield , Lewis Hammond , Peter Henderson , Atoosa Kasirzadeh , Seth Lazar , Anka Reuel , Kevin L. Wei , Jonathan Zittrain

AI Alignment: A Comprehensive Survey

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey,…

Artificial Intelligence · Computer Science 2025-04-07 Jiaming Ji , Tianyi Qiu , Boyuan Chen , Borong Zhang , Hantao Lou , Kaile Wang , Yawen Duan , Zhonghao He , Lukas Vierling , Donghai Hong , Jiayi Zhou , Zhaowei Zhang , Fanzhi Zeng , Juntao Dai , Xuehai Pan , Kwan Yee Ng , Aidan O'Gara , Hua Xu , Brian Tse , Jie Fu , Stephen McAleer , Yaodong Yang , Yizhou Wang , Song-Chun Zhu , Yike Guo , Wen Gao

Modelling Human Values for AI Reasoning

One of today's most significant societal challenges is building AI systems whose behaviour, or the behaviour it enables within communities of interacting agents (human and artificial), aligns with human values. To address this challenge, we…

Artificial Intelligence · Computer Science 2026-02-09 Nardine Osman , Mark d'Inverno

A Moral Imperative: The Need for Continual Superalignment of Large Language Models

This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs). Superalignment is a theoretical framework that aspires to ensure that superintelligent AI…

Computers and Society · Computer Science 2024-03-25 Gokul Puthumanaillam , Manav Vora , Pranay Thangeda , Melkior Ornik

Understanding the Process of Human-AI Value Alignment

Background: Value alignment in computer science research is often used to refer to the process of aligning artificial intelligence with humans, but the way the phrase is used often lacks precision. Objectives: In this paper, we conduct a…

Computers and Society · Computer Science 2026-03-27 Jack McKinlay , Marina De Vos , Janina A. Hoffmann , Andreas Theodorou

Aligning Artificial Intelligence with Humans through Public Policy

Given that Artificial Intelligence (AI) increasingly permeates our lives, it is critical that we systematically align AI objectives with the goals and values of humans. The human-AI alignment problem stems from the impracticality of…

Computers and Society · Computer Science 2022-07-05 John Nay , James Daily

A multilevel framework for AI governance

To realize the potential benefits and mitigate potential risks of AI, it is necessary to develop a framework of governance that conforms to ethics and fundamental human values. Although several organizations have issued guidelines and…

Computers and Society · Computer Science 2023-07-14 Hyesun Choung , Prabu David , John S. Seberger

The Alignment Target Problem: Divergent Moral Judgments of Humans, AI Systems, and Their Designers

The project of aligning machine behavior with human values raises a basic problem: whose moral expectations should guide AI decision-making? Much alignment research assumes that the appropriate benchmark is how humans themselves would act…

Computers and Society · Computer Science 2026-05-13 Benjamin Minhao Chen , Xinyu Xie

Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto

Increasing interest in ensuring the safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. This goal differs qualitatively from traditional task-specific AI…

Artificial Intelligence · Computer Science 2025-01-17 Elizaveta Tennant , Stephen Hailes , Mirco Musolesi

Learning Human-like Representations to Enable Learning Human Values

How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We explore the effects of…

Artificial Intelligence · Computer Science 2024-11-11 Andrea Wynn , Ilia Sucholutsky , Thomas L. Griffiths

Dynamic Normativity: Necessary and Sufficient Conditions for Value Alignment

The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence across all Humanities disciplines, revolves around the intricacies of morality and normativity. Surprisingly, in recent years, this thematic thread…

Artificial Intelligence · Computer Science 2024-06-19 Nicholas Kluge Corrêa

Measuring Value Alignment

As artificial intelligence (AI) systems become increasingly integrated into various domains, ensuring that they align with human values becomes critical. This paper introduces a novel formalism to quantify the alignment between AI systems…

Artificial Intelligence · Computer Science 2023-12-27 Fazl Barez , Philip Torr