Related papers: The Alignment Problem from a Deep Learning Perspec…

The Possibility of Artificial Intelligence Becoming a Subject and the Alignment Problem

Artificial General Intelligence (AGI) is increasingly being discussed not only as a tool, but also as a potential subject with personal and therefore moral status. In our opinion, the currently dominant alignment strategies, which focus on…

Artificial Intelligence · Computer Science 2026-04-17 Till Mossakowski , Helena Esther Grass

Position Paper: Bounded Alignment: What (Not) To Expect From AGI Agents

The issues of AI risk and AI safety are becoming critical as the prospect of artificial general intelligence (AGI) looms larger. The emergence of extremely large and capable generative models has led to alarming predictions and created a…

Artificial Intelligence · Computer Science 2025-05-20 Ali A. Minai

Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem

The AI alignment problem, which focusses on ensuring that artificial intelligence (AI), including AGI and ASI, systems act according to human values, presents profound challenges. With the progression from narrow AI to Artificial General…

Artificial Intelligence · Computer Science 2025-07-25 Alberto Hernández-Espinosa , Felipe S. Abrahão , Olaf Witkowski , Hector Zenil

Research Superalignment Should Advance Now with Alternating Competence and Conformity Optimization

The recent leap in AI capabilities, driven by big generative models, has sparked the possibility of achieving Artificial General Intelligence (AGI) and further triggered discussions on Artificial Superintelligence (ASI)-a system surpassing…

Artificial Intelligence · Computer Science 2026-02-10 HyunJin Kim , Xiaoyuan Yi , Jing Yao , Muhua Huang , JinYeong Bak , James Evans , Xing Xie

The Embeddings World and Artificial General Intelligence

From early days, a key and controversial question inside the artificial intelligence community was whether Artificial General Intelligence (AGI) is achievable. AGI is the ability of machines and computer programs to achieve human-level…

Artificial Intelligence · Computer Science 2022-09-15 Mostafa Haghir Chehreghani

Misalignment or misuse? The AGI alignment tradeoff

Creating systems that are aligned with our goals is seen as a leading approach to create safe and beneficial AI in both leading AI companies and the academic field of AI safety. We defend the view that misaligned AGI - future, generally…

Computers and Society · Computer Science 2025-06-05 Max Hellrigel-Holderbaum , Leonard Dung

An Approach to Technical AGI Safety and Security

Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of…

Artificial Intelligence · Computer Science 2025-04-03 Rohin Shah , Alex Irpan , Alexander Matt Turner , Anna Wang , Arthur Conmy , David Lindner , Jonah Brown-Cohen , Lewis Ho , Neel Nanda , Raluca Ada Popa , Rishub Jain , Rory Greig , Samuel Albanie , Scott Emmons , Sebastian Farquhar , Sébastien Krier , Senthooran Rajamanoharan , Sophie Bridgers , Tobi Ijitoye , Tom Everitt , Victoria Krakovna , Vikrant Varma , Vladimir Mikulik , Zachary Kenton , Dave Orr , Shane Legg , Noah Goodman , Allan Dafoe , Four Flynn , Anca Dragan

The AI Alignment Paradox

The field of AI alignment aims to steer AI systems toward human goals, preferences, and ethical principles. Its contributions have been instrumental for improving the output quality, safety, and trustworthiness of today's AI models. This…

Artificial Intelligence · Computer Science 2024-11-26 Robert West , Roland Aydin

Automated alignment is harder than you think

A leading proposal for aligning artificial superintelligence (ASI) is to use AI agents to automate an increasing fraction of alignment research as capabilities improve. We argue that, even when research agents are not scheming to…

Artificial Intelligence · Computer Science 2026-05-18 Aleksandr Bowkis , Marie Davidsen Buhl , Jacob Pfau , Geoffrey Irving

On the Ethics of Building AI in a Responsible Manner

The AI-alignment problem arises when there is a discrepancy between the goals that a human designer specifies to an AI learner and a potential catastrophic outcome that does not reflect what the human designer really wants. We argue that a…

Machine Learning · Computer Science 2020-04-10 Shai Shalev-Shwartz , Shaked Shammah , Amnon Shashua

Asymptotically Unambitious Artificial General Intelligence

General intelligence, the ability to solve arbitrary solvable problems, is supposed by many to be artificially constructible. Narrow intelligence, the ability to solve a given particularly difficult problem, has seen impressive recent…

Artificial Intelligence · Computer Science 2020-07-22 Michael K Cohen , Badri Vellambi , Marcus Hutter

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

The field of AI alignment is concerned with AI systems that pursue unintended goals. One commonly studied mechanism by which an unintended goal might arise is specification gaming, in which the designer-provided specification is flawed in a…

Machine Learning · Computer Science 2022-11-03 Rohin Shah , Vikrant Varma , Ramana Kumar , Mary Phuong , Victoria Krakovna , Jonathan Uesato , Zac Kenton

When Brain-inspired AI Meets AGI

Artificial General Intelligence (AGI) has been a long-standing goal of humanity, with the aim of creating machines capable of performing any intellectual task that humans can do. To achieve this, AGI researchers draw inspiration from the…

Artificial Intelligence · Computer Science 2023-03-29 Lin Zhao , Lu Zhang , Zihao Wu , Yuzhong Chen , Haixing Dai , Xiaowei Yu , Zhengliang Liu , Tuo Zhang , Xintao Hu , Xi Jiang , Xiang Li , Dajiang Zhu , Dinggang Shen , Tianming Liu

Human Misperception of Generative-AI Alignment: A Laboratory Experiment

We conduct an incentivized laboratory experiment to study people's perception of generative artificial intelligence (GenAI) alignment in the context of economic decision-making. Using a panel of economic problems spanning the domains of…

Theoretical Economics · Economics 2026-04-03 Kevin He , Ran Shorrer , Mengjia Xia

Artificial General Intelligence, Existential Risk, and Human Risk Perception

Artificial general intelligence (AGI) does not yet exist, but given the pace of technological development in artificial intelligence, it is projected to reach human-level intelligence within roughly the next two decades. After that, many…

Computers and Society · Computer Science 2023-11-16 David R. Mandel

Why We Don't Have AGI Yet

The original vision of AI was re-articulated in 2002 via the term 'Artificial General Intelligence' or AGI. This vision is to build 'Thinking Machines' - computer systems that can learn, reason, and solve problems similar to the way humans…

Artificial Intelligence · Computer Science 2023-09-20 Peter Voss , Mladjan Jovanovic

Deep Learning and Artificial General Intelligence: Still a Long Way to Go

In recent years, deep learning using neural network architecture, i.e. deep neural networks, has been on the frontier of computer science research. It has even lead to superhuman performance in some problems, e.g., in computer vision, games…

Machine Learning · Computer Science 2022-04-07 Maciej Świechowski

The Alignment Problem in Context

A core challenge in the development of increasingly capable AI systems is to make them safe and reliable by ensuring their behaviour is consistent with human values. This challenge, known as the alignment problem, does not merely apply to…

Machine Learning · Computer Science 2023-11-07 Raphaël Millière

Towards Integrated Alignment

As AI adoption expands across human society, the problem of aligning AI models to match human preferences remains a grand challenge. Currently, the AI alignment field is deeply divided between behavioral and representational approaches,…

Computers and Society · Computer Science 2025-08-12 Ben Y. Reis , William La Cava

The economic alignment problem of artificial intelligence

Artificial intelligence (AI) is advancing exponentially and is likely to have profound impacts on human wellbeing, social equity, and environmental sustainability. Here we argue that the "alignment problem" in AI research is also an…

General Economics · Economics 2026-04-30 Daniel W. O'Neill , Stefano Vrizzi , Noemi Luna Carmeno , Felix Creutzig , Jefim Vogel