Related papers: Concept Alignment

Concept Alignment as a Prerequisite for Value Alignment

Value alignment is essential for building AI systems that can safely and reliably interact with people. However, what a person values -- and is even capable of valuing -- depends on the concepts that they are currently using to understand…

Artificial Intelligence · Computer Science 2023-11-01 Sunayana Rane , Mark Ho , Ilia Sucholutsky , Thomas L. Griffiths

Understanding the Process of Human-AI Value Alignment

Background: Value alignment in computer science research is often used to refer to the process of aligning artificial intelligence with humans, but the way the phrase is used often lacks precision. Objectives: In this paper, we conduct a…

Computers and Society · Computer Science 2026-03-27 Jack McKinlay , Marina De Vos , Janina A. Hoffmann , Andreas Theodorou

The Challenge of Value Alignment: from Fairer Algorithms to AI Safety

This paper addresses the question of how to align AI systems with human values and situates it within a wider body of thought regarding technology and value. Far from existing in a vacuum, there has long been an interest in the ability of…

Computers and Society · Computer Science 2021-01-19 Iason Gabriel , Vafa Ghazavi

Rethinking How AI Embeds and Adapts to Human Values: Challenges and Opportunities

The concepts of ``human-centered AI'' and ``value-based decision'' have gained significant attention in both research and industry. However, many critical aspects remain underexplored and require further investigation. In particular, there…

Artificial Intelligence · Computer Science 2025-08-26 Sz-Ting Tzeng , Frank Dignum

Artificial Intelligence, Values and Alignment

This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive…

Computers and Society · Computer Science 2020-10-07 Iason Gabriel

Position: Towards Bidirectional Human-AI Alignment

Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes "alignment" limits meaningful progress and…

Human-Computer Interaction · Computer Science 2025-09-30 Hua Shen , Tiffany Knearem , Reshmi Ghosh , Kenan Alkiek , Kundan Krishna , Yachuan Liu , Ziqiao Ma , Savvas Petridis , Yi-Hao Peng , Li Qiwei , Sushrita Rakshit , Chenglei Si , Yutong Xie , Jeffrey P. Bigham , Frank Bentley , Joyce Chai , Zachary Lipton , Qiaozhu Mei , Rada Mihalcea , Michael Terry , Diyi Yang , Meredith Ringel Morris , Paul Resnick , David Jurgens

AI Alignment: A Comprehensive Survey

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey,…

Artificial Intelligence · Computer Science 2025-04-07 Jiaming Ji , Tianyi Qiu , Boyuan Chen , Borong Zhang , Hantao Lou , Kaile Wang , Yawen Duan , Zhonghao He , Lukas Vierling , Donghai Hong , Jiayi Zhou , Zhaowei Zhang , Fanzhi Zeng , Juntao Dai , Xuehai Pan , Kwan Yee Ng , Aidan O'Gara , Hua Xu , Brian Tse , Jie Fu , Stephen McAleer , Yaodong Yang , Yizhou Wang , Song-Chun Zhu , Yike Guo , Wen Gao

AI Alignment Dialogues: An Interactive Approach to AI Alignment in Support Agents

AI alignment is about ensuring AI systems only pursue goals and activities that are beneficial to humans. Most of the current approach to AI alignment is to learn what humans value from their behavioural data. This paper proposes a…

Artificial Intelligence · Computer Science 2023-10-06 Pei-Yu Chen , Myrthe L. Tielman , Dirk K. J. Heylen , Catholijn M. Jonker , M. Birna van Riemsdijk

Modelling Human Values for AI Reasoning

One of today's most significant societal challenges is building AI systems whose behaviour, or the behaviour it enables within communities of interacting agents (human and artificial), aligns with human values. To address this challenge, we…

Artificial Intelligence · Computer Science 2026-02-09 Nardine Osman , Mark d'Inverno

A Multi-Level Framework for the AI Alignment Problem

AI alignment considers how we can encode AI systems in a way that is compatible with human values. The normative side of this problem asks what moral values or principles, if any, we should encode in AI. To this end, we present a framework…

Computers and Society · Computer Science 2023-01-11 Betty Li Hou , Brian Patrick Green

The AI Alignment Paradox

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…

Artificial Intelligence · Computer Science 2024-11-26 Robert West , Roland Aydin

Aligning Generalisation Between Humans and Machines

Recent advances in AI -- including generative approaches -- have resulted in technology that can support humans in scientific discovery and forming decisions, but may also disrupt democracies and target individuals. The responsible use of…

Artificial Intelligence · Computer Science 2025-05-28 Filip Ilievski , Barbara Hammer , Frank van Harmelen , Benjamin Paassen , Sascha Saralajew , Ute Schmid , Michael Biehl , Marianna Bolognesi , Xin Luna Dong , Kiril Gashteovski , Pascal Hitzler , Giuseppe Marra , Pasquale Minervini , Martin Mundt , Axel-Cyrille Ngonga Ngomo , Alessandro Oltramari , Gabriella Pasi , Zeynep G. Saribatur , Luciano Serafini , John Shawe-Taylor , Vered Shwartz , Gabriella Skitalinskaya , Clemens Stachl , Gido M. van de Ven , Thomas Villmann

Aligning Artificial Intelligence with Humans through Public Policy

Given that Artificial Intelligence (AI) increasingly permeates our lives, it is critical that we systematically align AI objectives with the goals and values of humans. The human-AI alignment problem stems from the impracticality of…

Computers and Society · Computer Science 2022-07-05 John Nay , James Daily

Value alignment: a formal approach

principles that should govern autonomous AI systems. It essentially states that a system's goals and behaviour should be aligned with human values. But how to ensure value alignment? In this paper we first provide a formal model to…

Artificial Intelligence · Computer Science 2024-02-08 Carles Sierra , Nardine Osman , Pablo Noriega , Jordi Sabater-Mir , Antoni Perelló

Strong and weak alignment of large language models with human values

Minimizing negative impacts of Artificial Intelligent (AI) systems on human societies without human supervision requires them to be able to align with human values. However, most current work only addresses this issue from a technical point…

Computation and Language · Computer Science 2024-08-13 Mehdi Khamassi , Marceau Nahon , Raja Chatila

The Linguistic Blind Spot of Value-Aligned Agency, Natural and Artificial

The value-alignment problem for artificial intelligence (AI) asks how we can ensure that the 'values' (i.e., objective functions) of artificial systems are aligned with the values of humanity. In this paper, I argue that linguistic…

Artificial Intelligence · Computer Science 2022-07-05 Travis LaCroix

AI Safety and Reproducibility: Establishing Robust Foundations for the Neuropsychology of Human Values

We propose the creation of a systematic effort to identify and replicate key findings in neuropsychology and allied fields related to understanding human values. Our aim is to ensure that research underpinning the value alignment problem of…

Artificial Intelligence · Computer Science 2018-09-11 Gopal P. Sarma , Nick J. Hay , Adam Safron

Goal Alignment: A Human-Aware Account of Value Alignment Problem

Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users. The problem has been widely argued to be one of the central safety problems in AI.…

Artificial Intelligence · Computer Science 2023-02-10 Malek Mechergui , Sarath Sreedharan

Rethinking AI Cultural Alignment

As general-purpose artificial intelligence (AI) systems become increasingly integrated with diverse human communities, cultural alignment has emerged as a crucial element in their deployment. Most existing approaches treat cultural…

Artificial Intelligence · Computer Science 2025-03-11 Michal Bravansky , Filip Trhlik , Fazl Barez

Beyond Prompts: Learning from Human Communication for Enhanced AI Intent Alignment

AI intent alignment, ensuring that AI produces outcomes as intended by users, is a critical challenge in human-AI interaction. The emergence of generative AI, including LLMs, has intensified the significance of this problem, as interactions…

Human-Computer Interaction · Computer Science 2024-06-21 Yoonsu Kim , Kihoon Son , Seoyoung Kim , Juho Kim