Related papers: Aligned: A Platform-based Process for Alignment

Legal Alignment for Safe and Ethical AI

Alignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally…

Computers and Society · Computer Science 2026-01-08 Noam Kolt , Nicholas Caputo , Jack Boeglin , Cullen O'Keefe , Rishi Bommasani , Stephen Casper , Mariano-Florentino Cuéllar , Noah Feldman , Iason Gabriel , Gillian K. Hadfield , Lewis Hammond , Peter Henderson , Atoosa Kasirzadeh , Seth Lazar , Anka Reuel , Kevin L. Wei , Jonathan Zittrain

Positive Alignment: Artificial Intelligence for Human Flourishing

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete.…

Artificial Intelligence · Computer Science 2026-05-15 Ruben Laukkonen , Seb Krier , Chloé Bakalar , Shamil Chandaria , Morten Kringelbach , Adam Elwood , Daniel Ford , Fernando Rosas , Maty Bohacek , Matija Franklin , Nenad Tomašev , Stephanie Chan , Verena Rieser , Roma Patel , Michael Levin , Arun Rao

AI Alignment: A Comprehensive Survey

AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, so do risks from misalignment. To provide a comprehensive and up-to-date overview of the alignment field, in this survey,…

Artificial Intelligence · Computer Science 2025-04-07 Jiaming Ji , Tianyi Qiu , Boyuan Chen , Borong Zhang , Hantao Lou , Kaile Wang , Yawen Duan , Zhonghao He , Lukas Vierling , Donghai Hong , Jiayi Zhou , Zhaowei Zhang , Fanzhi Zeng , Juntao Dai , Xuehai Pan , Kwan Yee Ng , Aidan O'Gara , Hua Xu , Brian Tse , Jie Fu , Stephen McAleer , Yaodong Yang , Yizhou Wang , Song-Chun Zhu , Yike Guo , Wen Gao

Position Paper: Bounded Alignment: What (Not) To Expect From AGI Agents

The issues of AI risk and AI safety are becoming critical as the prospect of artificial general intelligence (AGI) looms larger. The emergence of extremely large and capable generative models has led to alarming predictions and created a…

Artificial Intelligence · Computer Science 2025-05-20 Ali A. Minai

Towards Integrated Alignment

As AI adoption expands across human society, the problem of aligning AI models to match human preferences remains a grand challenge. Currently, the AI alignment field is deeply divided between behavioral and representational approaches,…

Computers and Society · Computer Science 2025-08-12 Ben Y. Reis , William La Cava

Disentangling AI Alignment: A Structured Taxonomy Beyond Safety and Ethics

Recent advances in AI research make it increasingly plausible that artificial agents with consequential real-world impact will soon operate beyond tightly controlled environments. Ensuring that these agents are not only safe but that they…

Computers and Society · Computer Science 2025-06-10 Kevin Baum

CERN for AI: A Theoretical Framework for Autonomous Simulation-Based Artificial Intelligence Testing and Alignment

This paper explores the potential of a multidisciplinary approach to testing and aligning artificial intelligence (AI), specifically focusing on large language models (LLMs). Due to the rapid development and wide application of LLMs,…

Computers and Society · Computer Science 2025-01-07 Ljubisa Bojic , Matteo Cinelli , Dubravko Culibrk , Boris Delibasic

Alignment, Agency and Autonomy in Frontier AI: A Systems Engineering Perspective

As artificial intelligence scales, the concepts of alignment, agency, and autonomy have become central to AI safety, governance, and control. However, even in human contexts, these terms lack universal definitions, varying across…

Computers and Society · Computer Science 2025-03-11 Krti Tallam

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

The emergence of large language models (LLMs) has sparked the possibility of about Artificial Superintelligence (ASI), a hypothetical AI system surpassing human intelligence. However, existing alignment paradigms struggle to guide such…

Machine Learning · Computer Science 2024-12-30 HyunJin Kim , Xiaoyuan Yi , Jing Yao , Jianxun Lian , Muhua Huang , Shitong Duan , JinYeong Bak , Xing Xie

Getting aligned on representational alignment

Biological and artificial information processing systems form representations of the world that they can use to categorize, reason, plan, navigate, and make decisions. How can we measure the similarity between the representations formed by…

Neurons and Cognition · Quantitative Biology 2024-11-27 Ilia Sucholutsky , Lukas Muttenthaler , Adrian Weller , Andi Peng , Andreea Bobu , Been Kim , Bradley C. Love , Christopher J. Cueva , Erin Grant , Iris Groen , Jascha Achterberg , Joshua B. Tenenbaum , Katherine M. Collins , Katherine L. Hermann , Kerem Oktar , Klaus Greff , Martin N. Hebart , Nathan Cloos , Nikolaus Kriegeskorte , Nori Jacoby , Qiuyi Zhang , Raja Marjieh , Robert Geirhos , Sherol Chen , Simon Kornblith , Sunayana Rane , Talia Konkle , Thomas P. O'Connell , Thomas Unterthiner , Andrew K. Lampinen , Klaus-Robert Müller , Mariya Toneva , Thomas L. Griffiths

A Blockchain-Enabled Approach to Cross-Border Compliance and Trust

As artificial intelligence (AI) systems become increasingly integral to critical infrastructure and global operations, the need for a unified, trustworthy governance framework is more urgent that ever. This paper proposes a novel approach…

Artificial Intelligence · Computer Science 2025-01-17 Vikram Kulothungan

Democratic AI is Possible. The Democracy Levels Framework Shows How It Might Work

This position paper argues that effectively "democratizing AI" requires democratic governance and alignment of AI, and that this is particularly valuable for decisions with systemic societal impacts. Initial steps -- such as Meta's…

Computers and Society · Computer Science 2025-08-22 Aviv Ovadya , Kyle Redman , Luke Thorburn , Quan Ze Chen , Oliver Smith , Flynn Devine , Andrew Konya , Smitha Milli , Manon Revel , K. J. Kevin Feng , Amy X. Zhang , Bilva Chandra , Michiel A. Bakker , Atoosa Kasirzadeh

Governing AI Beyond the Pretraining Frontier

This year, jurisdictions worldwide, including the United States, the European Union, the United Kingdom, and China, are set to enact or revise laws governing frontier AI. Their efforts largely rely on the assumption that increasing model…

Computers and Society · Computer Science 2025-02-25 Nicholas A. Caputo

Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research

While much research in artificial intelligence (AI) has focused on scaling capabilities, the accelerating pace of development makes countervailing work on producing harmless, "aligned" systems increasingly urgent. Yet research on alignment…

Artificial Intelligence · Computer Science 2025-12-12 Dani Roytburg , Beck Miller

International Institutions for Advanced AI

International institutions may have an important role to play in ensuring advanced AI systems benefit humanity. International collaborations can unlock AI's ability to further sustainable development, and coordination of regulatory efforts…

Computers and Society · Computer Science 2023-07-12 Lewis Ho , Joslyn Barnhart , Robert Trager , Yoshua Bengio , Miles Brundage , Allison Carnegie , Rumman Chowdhury , Allan Dafoe , Gillian Hadfield , Margaret Levi , Duncan Snidal

A five-layer framework for AI governance: integrating regulation, standards, and certification

Purpose: The governance of artificial iintelligence (AI) systems requires a structured approach that connects high-level regulatory principles with practical implementation. Existing frameworks lack clarity on how regulations translate into…

Computers and Society · Computer Science 2025-09-16 Avinash Agarwal , Manisha J. Nene

Possible Principles for Aligned Structure Learning Agents

This paper offers a roadmap for the development of scalable aligned artificial intelligence (AI) from first principle descriptions of natural intelligence. In brief, a possible path toward scalable aligned AI rests upon enabling artificial…

Artificial Intelligence · Computer Science 2025-08-29 Lancelot Da Costa , Tomáš Gavenčiak , David Hyland , Mandana Samiei , Cristian Dragos-Manta , Candice Pattisapu , Adeel Razi , Karl Friston

Clear, Compelling Arguments: Rethinking the Foundations of Frontier AI Safety Cases

This paper contributes to the nascent debate around safety cases for frontier AI systems. Safety cases are structured, defensible arguments that a system is acceptably safe to deploy in a given context. Historically, they have been used in…

Computers and Society · Computer Science 2026-03-11 Shaun Feakins , Ibrahim Habli , Phillip Morgan

Alignment as Jurisprudence

Jurisprudence, the study of how judges should properly decide cases, and alignment, the science of getting AI models to conform to human values, share a fundamental structure. These seemingly distant fields both seek to predict and shape…

Artificial Intelligence · Computer Science 2026-05-12 Nicholas Caputo

Out of Control -- Why Alignment Needs Formal Control Theory (and an Alignment Control Stack)

This position paper argues that formal optimal control theory should be central to AI alignment research, offering a distinct perspective from prevailing AI safety and security approaches. While recent work in AI safety and mechanistic…

Artificial Intelligence · Computer Science 2025-06-24 Elija Perrier