Related papers: Towards Integrated Alignment

The AI Alignment Paradox

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…

Artificial Intelligence · Computer Science 2024-11-26 Robert West , Roland Aydin

Positive Alignment: Artificial Intelligence for Human Flourishing

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete.…

Artificial Intelligence · Computer Science 2026-05-15 Ruben Laukkonen , Seb Krier , Chloé Bakalar , Shamil Chandaria , Morten Kringelbach , Adam Elwood , Daniel Ford , Fernando Rosas , Maty Bohacek , Matija Franklin , Nenad Tomašev , Stephanie Chan , Verena Rieser , Roma Patel , Michael Levin , Arun Rao

Position: Towards Bidirectional Human-AI Alignment

Recent advances in general-purpose AI underscore the urgent need to align AI systems with human goals and values. Yet, the lack of a clear, shared understanding of what constitutes "alignment" limits meaningful progress and…

Human-Computer Interaction · Computer Science 2025-09-30 Hua Shen , Tiffany Knearem , Reshmi Ghosh , Kenan Alkiek , Kundan Krishna , Yachuan Liu , Ziqiao Ma , Savvas Petridis , Yi-Hao Peng , Li Qiwei , Sushrita Rakshit , Chenglei Si , Yutong Xie , Jeffrey P. Bigham , Frank Bentley , Joyce Chai , Zachary Lipton , Qiaozhu Mei , Rada Mihalcea , Michael Terry , Diyi Yang , Meredith Ringel Morris , Paul Resnick , David Jurgens

AI Alignment: A Comprehensive Survey

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey,…

Artificial Intelligence · Computer Science 2025-04-07 Jiaming Ji , Tianyi Qiu , Boyuan Chen , Borong Zhang , Hantao Lou , Kaile Wang , Yawen Duan , Zhonghao He , Lukas Vierling , Donghai Hong , Jiayi Zhou , Zhaowei Zhang , Fanzhi Zeng , Juntao Dai , Xuehai Pan , Kwan Yee Ng , Aidan O'Gara , Hua Xu , Brian Tse , Jie Fu , Stephen McAleer , Yaodong Yang , Yizhou Wang , Song-Chun Zhu , Yike Guo , Wen Gao

Researching Alignment Research: Unsupervised Analysis

AI alignment research is the field of study dedicated to ensuring that artificial intelligence (AI) benefits humans. As machine intelligence gets more advanced, this research is becoming increasingly important. Researchers in the field…

Computers and Society · Computer Science 2022-06-08 Jan H. Kirchner , Logan Smith , Jacques Thibodeau , Kyle McDonell , Laria Reynolds

On the Ethics of Building AI in a Responsible Manner

The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants. We argue that a…

Machine Learning · Computer Science 2020-04-10 Shai Shalev-Shwartz , Shaked Shammah , Amnon Shashua

AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?

AI alignment research aims to develop techniques to ensure that AI systems do not cause harm. However, every alignment technique has failure modes, which are conditions in which there is a non-negligible chance that the technique fails to…

Artificial Intelligence · Computer Science 2025-10-14 Leonard Dung , Florian Mai

Getting aligned on representational alignment

Biological and artificial information processing systems form representations of the world that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the similarity between the representations formed by…

Neurons and Cognition · Quantitative Biology 2024-11-27 Ilia Sucholutsky , Lukas Muttenthaler , Adrian Weller , Andi Peng , Andreea Bobu , Been Kim , Bradley C. Love , Christopher J. Cueva , Erin Grant , Iris Groen , Jascha Achterberg , Joshua B. Tenenbaum , Katherine M. Collins , Katherine L. Hermann , Kerem Oktar , Klaus Greff , Martin N. Hebart , Nathan Cloos , Nikolaus Kriegeskorte , Nori Jacoby , Qiuyi Zhang , Raja Marjieh , Robert Geirhos , Sherol Chen , Simon Kornblith , Sunayana Rane , Talia Konkle , Thomas P. O'Connell , Thomas Unterthiner , Andrew K. Lampinen , Klaus-Robert Müller , Mariya Toneva , Thomas L. Griffiths

Strategic Alignment Patterns in National AI Policies

This paper introduces a novel visual mapping methodology for assessing strategic alignment in national artificial intelligence policies. The proliferation of AI strategies across countries has created an urgent need for analytical…

Computers and Society · Computer Science 2025-07-10 Mohammad Hossein Azin , Hessam Zandhessami

The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process

This position paper states that AI Alignment in Multi-Agent Systems (MAS) should be considered a dynamic and interaction-dependent process that heavily depends on the social environment where agents are deployed, either collaborative,…

Artificial Intelligence · Computer Science 2025-06-09 Florian Carichon , Aditi Khandelwal , Marylou Fauchard , Golnoosh Farnadi

Network Alignment

Complex networks are frequently employed to model physical or virtual complex systems. When certain entities exist across multiple systems simultaneously, unveiling their corresponding relationships across the networks becomes crucial. This…

Physics and Society · Physics 2025-04-16 Rui Tang , Ziyun Yong , Shuyu Jiang , Xingshu Chen , Yaofang Liu , Yi-Cheng Zhang , Gui-Quan Sun , Wei Wang

Legal Alignment for Safe and Ethical AI

Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally…

Computers and Society · Computer Science 2026-01-08 Noam Kolt , Nicholas Caputo , Jack Boeglin , Cullen O'Keefe , Rishi Bommasani , Stephen Casper , Mariano-Florentino Cuéllar , Noah Feldman , Iason Gabriel , Gillian K. Hadfield , Lewis Hammond , Peter Henderson , Atoosa Kasirzadeh , Seth Lazar , Anka Reuel , Kevin L. Wei , Jonathan Zittrain

The Alignment Problem in Context

A core challenge in the development of increasingly capable AI systems is to make them safe and reliable by ensuring their behaviour is consistent with human values. This challenge, known as the alignment problem, does not merely apply to…

Machine Learning · Computer Science 2023-11-07 Raphaël Millière

Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem

The AI alignment problem, which focusses on ensuring that artificial intelligence (AI), including AGI and ASI, systems act according to human values, presents profound challenges. With the progression from narrow AI to Artificial General…

Artificial Intelligence · Computer Science 2025-07-25 Alberto Hernández-Espinosa , Felipe S. Abrahão , Olaf Witkowski , Hector Zenil

Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics

Recent advances in AI research make it increasingly plausible that artificial agents with consequential real-world impact will soon operate beyond tightly controlled environments. Ensuring that these agents are not only safe but that they…

Computers and Society · Computer Science 2025-06-10 Kevin Baum

Rethinking How AI Embeds and Adapts to Human Values: Challenges and Opportunities

The concepts of ``human-centered AI'' and ``value-based decision'' have gained significant attention in both research and industry. However, many critical aspects remain underexplored and require further investigation. In particular, there…

Artificial Intelligence · Computer Science 2025-08-26 Sz-Ting Tzeng , Frank Dignum

Challenges and Future Directions of Data-Centric AI Alignment

As AI systems become increasingly capable and influential, ensuring their alignment with human values, preferences, and goals has become a critical research focus. Current alignment methods primarily focus on designing algorithms and loss…

Computation and Language · Computer Science 2025-05-02 Min-Hsuan Yeh , Jeffrey Wang , Xuefeng Du , Seongheon Park , Leitian Tao , Shawn Im , Yixuan Li

Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research

While much research in artificial intelligence (AI) has focused on scaling capabilities, the accelerating pace of development makes countervailing work on producing harmless, "aligned" systems increasingly urgent. Yet research on alignment…

Artificial Intelligence · Computer Science 2025-12-12 Dani Roytburg , Beck Miller

The Elephant in the Room -- Why AI Safety Demands Diverse Teams

We consider that existing approaches to AI "safety" and "alignment" may not be using the most effective tools, teams, or approaches. We suggest that an alternative and better approach to the problem may be to treat alignment as a social…

Computers and Society · Computer Science 2024-07-16 David Rostcheck , Lara Scheibling

Deceptive Alignment Monitoring

As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a…

Machine Learning · Computer Science 2023-07-27 Andres Carranza , Dhruv Pai , Rylan Schaeffer , Arnuv Tandon , Sanmi Koyejo