English
Related papers

Related papers: A Multi-Level Framework for the AI Alignment Probl…

200 papers

This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive…

Computers and Society · Computer Science 2020-10-07 Iason Gabriel

Solving the AI alignment problem requires having clear, defensible values towards which AI systems can align. Currently, targets for alignment remain underspecified and do not seem to be built from a philosophically robust structure. We…

Computers and Society · Computer Science 2023-11-29 Betty Li Hou , Brian Patrick Green

This paper addresses the question of how to align AI systems with human values and situates it within a wider body of thought regarding technology and value. Far from existing in a vacuum, there has long been an interest in the ability of…

Computers and Society · Computer Science 2021-01-19 Iason Gabriel , Vafa Ghazavi

The concepts of ``human-centered AI'' and ``value-based decision'' have gained significant attention in both research and industry. However, many critical aspects remain underexplored and require further investigation. In particular, there…

Artificial Intelligence · Computer Science 2025-08-26 Sz-Ting Tzeng , Frank Dignum

Discussion of AI alignment (alignment between humans and AI systems) has focused on value alignment, broadly referring to creating AI systems that share human values. We argue that before we can even attempt to align values, it is…

Machine Learning · Computer Science 2024-01-18 Sunayana Rane , Polyphony J. Bruna , Ilia Sucholutsky , Christopher Kello , Thomas L. Griffiths

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…

Artificial Intelligence · Computer Science 2024-11-26 Robert West , Roland Aydin

In high-stakes AI-supported decisions, considerations are not purely technical but involve moral judgments about fairness, responsibility, and harm. While prior research has focused mainly on functional or behavioral alignment, this paper…

Human-Computer Interaction · Computer Science 2026-04-17 Christiane Ernst , Luis Gutmann , Domenique Zipperling , Kathrin Figl , Niklas Kühl

principles that should govern autonomous AI systems. It essentially states that a system's goals and behaviour should be aligned with human values. But how to ensure value alignment? In this paper we first provide a formal model to…

Artificial Intelligence · Computer Science 2024-02-08 Carles Sierra , Nardine Osman , Pablo Noriega , Jordi Sabater-Mir , Antoni Perelló

Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally…

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey,…

One of today's most significant societal challenges is building AI systems whose behaviour, or the behaviour it enables within communities of interacting agents (human and artificial), aligns with human values. To address this challenge, we…

Artificial Intelligence · Computer Science 2026-02-09 Nardine Osman , Mark d'Inverno

This paper examines the challenges associated with achieving life-long superalignment in AI systems, particularly large language models (LLMs). Superalignment is a theoretical framework that aspires to ensure that superintelligent AI…

Computers and Society · Computer Science 2024-03-25 Gokul Puthumanaillam , Manav Vora , Pranay Thangeda , Melkior Ornik

Background: Value alignment in computer science research is often used to refer to the process of aligning artificial intelligence with humans, but the way the phrase is used often lacks precision. Objectives: In this paper, we conduct a…

Computers and Society · Computer Science 2026-03-27 Jack McKinlay , Marina De Vos , Janina A. Hoffmann , Andreas Theodorou

Given that Artificial Intelligence (AI) increasingly permeates our lives, it is critical that we systematically align AI objectives with the goals and values of humans. The human-AI alignment problem stems from the impracticality of…

Computers and Society · Computer Science 2022-07-05 John Nay , James Daily

To realize the potential benefits and mitigate potential risks of AI, it is necessary to develop a framework of governance that conforms to ethics and fundamental human values. Although several organizations have issued guidelines and…

Computers and Society · Computer Science 2023-07-14 Hyesun Choung , Prabu David , John S. Seberger

The project of aligning machine behavior with human values raises a basic problem: whose moral expectations should guide AI decision-making? Much alignment research assumes that the appropriate benchmark is how humans themselves would act…

Computers and Society · Computer Science 2026-05-13 Benjamin Minhao Chen , Xinyu Xie

Increasing interest in ensuring the safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. This goal differs qualitatively from traditional task-specific AI…

Artificial Intelligence · Computer Science 2025-01-17 Elizaveta Tennant , Stephen Hailes , Mirco Musolesi

How can we build AI systems that can learn any set of individual human values both quickly and safely, avoiding causing harm or violating societal standards for acceptable behavior during the learning process? We explore the effects of…

Artificial Intelligence · Computer Science 2024-11-11 Andrea Wynn , Ilia Sucholutsky , Thomas L. Griffiths

The critical inquiry pervading the realm of Philosophy, and perhaps extending its influence across all Humanities disciplines, revolves around the intricacies of morality and normativity. Surprisingly, in recent years, this thematic thread…

Artificial Intelligence · Computer Science 2024-06-19 Nicholas Kluge Corrêa

As artificial intelligence (AI) systems become increasingly integrated into various domains, ensuring that they align with human values becomes critical. This paper introduces a novel formalism to quantify the alignment between AI systems…

Artificial Intelligence · Computer Science 2023-12-27 Fazl Barez , Philip Torr
‹ Prev 1 2 3 10 Next ›