Related papers: Value Alignment Verification

Goal Alignment: A Human-Aware Account of Value Alignment Problem

Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users. The problem has been widely argued to be one of the central safety problems in AI.…

Artificial Intelligence · Computer Science 2023-02-10 Malek Mechergui , Sarath Sreedharan

Ethics2vec: aligning automatic agents and human preferences

Though intelligent agents are supposed to improve human experience (or make it more efficient), it is hard from a human perspective to grasp the ethical values which are explicitly or implicitly embedded in an agent behaviour. This is the…

Artificial Intelligence · Computer Science 2025-08-12 Gianluca Bontempi

Understanding the Process of Human-AI Value Alignment

Background: Value alignment in computer science research is often used to refer to the process of aligning artificial intelligence with humans, but the way the phrase is used often lacks precision. Objectives: In this paper, we conduct a…

Computers and Society · Computer Science 2026-03-27 Jack McKinlay , Marina De Vos , Janina A. Hoffmann , Andreas Theodorou

Concept Alignment as a Prerequisite for Value Alignment

Value alignment is essential for building AI systems that can safely and reliably interact with people. However, what a person values -- and is even capable of valuing -- depends on the concepts that they are currently using to understand…

Artificial Intelligence · Computer Science 2023-11-01 Sunayana Rane , Mark Ho , Ilia Sucholutsky , Thomas L. Griffiths

Pragmatic-Pedagogic Value Alignment

As intelligent systems gain autonomy and capability, it becomes vital to ensure that their objectives match those of their human users; this is known as the value-alignment problem. In robotics, value alignment is key to the design of…

Artificial Intelligence · Computer Science 2018-02-07 Jaime F. Fisac , Monica A. Gates , Jessica B. Hamrick , Chang Liu , Dylan Hadfield-Menell , Malayandi Palaniappan , Dhruv Malik , S. Shankar Sastry , Thomas L. Griffiths , Anca D. Dragan

Training Value-Aligned Reinforcement Learning Agents Using a Normative Prior

As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally, using only a measure of task performance as feedback, can violate societal norms for acceptable behavior…

Machine Learning · Computer Science 2021-04-20 Md Sultan Al Nahian , Spencer Frazier , Brent Harrison , Mark Riedl

The Linguistic Blind Spot of Value-Aligned Agency, Natural and Artificial

The value-alignment problem for artificial intelligence (AI) asks how we can ensure that the 'values' (i.e., objective functions) of artificial systems are aligned with the values of humanity. In this paper, I argue that linguistic…

Artificial Intelligence · Computer Science 2022-07-05 Travis LaCroix

HAVA: Hybrid Approach to Value-Alignment through Reward Weighing for Reinforcement Learning

Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value-alignment is to create agents that not only do their tasks but through their…

Artificial Intelligence · Computer Science 2025-05-22 Kryspin Varys , Federico Cerutti , Adam Sobey , Timothy J. Norman

Rethinking How AI Embeds and Adapts to Human Values: Challenges and Opportunities

The concepts of ``human-centered AI'' and ``value-based decision'' have gained significant attention in both research and industry. However, many critical aspects remain underexplored and require further investigation. In particular, there…

Artificial Intelligence · Computer Science 2025-08-26 Sz-Ting Tzeng , Frank Dignum

Scalable agent alignment via reward modeling: a research direction

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task…

Machine Learning · Computer Science 2018-11-20 Jan Leike , David Krueger , Tom Everitt , Miljan Martic , Vishal Maini , Shane Legg

Generating Causal Explanations of Vehicular Agent Behavioural Interactions with Learnt Reward Profiles

Transparency and explainability are important features that responsible autonomous vehicles should possess, particularly when interacting with humans, and causal reasoning offers a strong basis to provide these qualities. However, even if…

Artificial Intelligence · Computer Science 2025-11-18 Rhys Howard , Nick Hawes , Lars Kunze

Value Alignment and Trust in Human-Robot Interaction: Insights from Simulation and User Study

With the advent of AI technologies, humans and robots are increasingly teaming up to perform collaborative tasks. To enable smooth and effective collaboration, the topic of value alignment (operationalized herein as the degree of dynamic…

Robotics · Computer Science 2024-05-29 Shreyas Bhat , Joseph B. Lyons , Cong Shi , X. Jessie Yang

Value alignment: a formal approach

principles that should govern autonomous AI systems. It essentially states that a system's goals and behaviour should be aligned with human values. But how to ensure value alignment? In this paper we first provide a formal model to…

Artificial Intelligence · Computer Science 2024-02-08 Carles Sierra , Nardine Osman , Pablo Noriega , Jordi Sabater-Mir , Antoni Perelló

A Framework for Human-Reason-Aligned Trajectory Evaluation in Automated Vehicles

One major challenge for the adoption and acceptance of automated vehicles (AVs) is ensuring that they can make sound decisions in everyday situations that involve ethical tension. Much attention has focused on rare, high-stakes dilemmas…

Robotics · Computer Science 2025-11-07 Lucas Elbert Suryana , Saeed Rahmani , Simeon Craig Calvert , Arkady Zgonnikov , Bart van Arem

Designing for Human-Agent Alignment: Understanding what humans want from their agents

Our ability to build autonomous agents that leverage Generative AI continues to increase by the day. As builders and users of such agents it is unclear what parameters we need to align on before the agents start performing tasks on our…

Artificial Intelligence · Computer Science 2024-04-09 Nitesh Goyal , Minsuk Chang , Michael Terry

Towards Verified Code Reasoning by LLMs

While LLM-based agents are able to tackle a wide variety of code reasoning questions, the answers are not always correct. This prevents the agent from being useful in situations where high precision is desired: (1) helping a software…

Software Engineering · Computer Science 2025-11-17 Meghana Sistla , Gogul Balakrishnan , Pat Rondon , José Cambronero , Michele Tufano , Satish Chandra

Towards better Human-Agent Alignment: Assessing Task Utility in LLM-Powered Applications

The rapid development in the field of Large Language Models (LLMs) has led to a surge in applications that facilitate collaboration among multiple agents to assist humans in their daily tasks. However, a significant gap remains in assessing…

Computation and Language · Computer Science 2024-02-26 Negar Arabzadeh , Julia Kiseleva , Qingyun Wu , Chi Wang , Ahmed Awadallah , Victor Dibia , Adam Fourney , Charles Clarke

Multi-Value Alignment in Normative Multi-Agent System: An Evolutionary Optimisation Approach

Value-alignment in normative multi-agent systems is used to promote a certain value and to ensure the consistent behaviour of agents in autonomous intelligent systems with human values. However, the current literature is limited to the…

Multiagent Systems · Computer Science 2023-10-13 Maha Riad , Vinicius de Carvalho , Fatemeh Golpayegani

Grounding Value Alignment with Ethical Principles

An important step in the development of value alignment (VA) systems in AI is understanding how values can interrelate with facts. Designers of future VA systems will need to utilize a hybrid approach in which ethical reasoning and…

Artificial Intelligence · Computer Science 2019-07-15 Tae Wan Kim , Thomas Donaldson , John Hooker

AI Alignment Dialogues: An Interactive Approach to AI Alignment in Support Agents

AI alignment is about ensuring AI systems only pursue goals and activities that are beneficial to humans. Most of the current approach to AI alignment is to learn what humans value from their behavioural data. This paper proposes a…

Artificial Intelligence · Computer Science 2023-10-06 Pei-Yu Chen , Myrthe L. Tielman , Dirk K. J. Heylen , Catholijn M. Jonker , M. Birna van Riemsdijk