Related papers: The AGI Containment Problem

AGI Safety Literature Review

The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily…

Artificial Intelligence · Computer Science 2018-05-22 Tom Everitt , Gary Lea , Marcus Hutter

Guidelines for Artificial Intelligence Containment

With almost daily improvements in capabilities of artificial intelligence it is more important than ever to develop safety software for use by the AI research community. Building on our previous work on AI Containment Problem we propose a…

Artificial Intelligence · Computer Science 2017-07-27 James Babcock , Janos Kramar , Roman V. Yampolskiy

Stovepiping and Malicious Software: A Critical Review of AGI Containment

Awareness of the possible impacts associated with artificial intelligence has risen in proportion to progress in the field. While there are tremendous benefits to society, many argue that there are just as many, if not more, concerns…

Artificial Intelligence · Computer Science 2021-08-03 Jason M. Pittman , Jesus P. Espinoza , Courtney Crosby

Software Testing of Generative AI Systems: Challenges and Opportunities

Software Testing is a well-established area in software engineering, encompassing various techniques and methodologies to ensure the quality and reliability of software systems. However, with the advent of generative artificial intelligence…

Software Engineering · Computer Science 2023-09-18 Aldeida Aleti

An Approach to Technical AGI Safety and Security

Artificial General Intelligence (AGI) promises transformative benefits but also presents significant risks. We develop an approach to address the risk of harms consequential enough to significantly harm humanity. We identify four areas of…

Artificial Intelligence · Computer Science 2025-04-03 Rohin Shah , Alex Irpan , Alexander Matt Turner , Anna Wang , Arthur Conmy , David Lindner , Jonah Brown-Cohen , Lewis Ho , Neel Nanda , Raluca Ada Popa , Rishub Jain , Rory Greig , Samuel Albanie , Scott Emmons , Sebastian Farquhar , Sébastien Krier , Senthooran Rajamanoharan , Sophie Bridgers , Tobi Ijitoye , Tom Everitt , Victoria Krakovna , Vikrant Varma , Vladimir Mikulik , Zachary Kenton , Dave Orr , Shane Legg , Noah Goodman , Allan Dafoe , Four Flynn , Anca Dragan

Towards best practices in AGI safety and governance: A survey of expert opinion

A number of leading AI companies, including OpenAI, Google DeepMind, and Anthropic, have the stated goal of building artificial general intelligence (AGI) - AI systems that achieve or exceed human performance across a wide range of…

Computers and Society · Computer Science 2023-05-15 Jonas Schuett , Noemi Dreksler , Markus Anderljung , David McCaffary , Lennart Heim , Emma Bluemke , Ben Garfinkel

Building Safer AGI by introducing Artificial Stupidity

Artificial Intelligence (AI) achieved super-human performance in a broad variety of domains. We say that an AI is made Artificially Stupid on a task when some limitations are deliberately introduced to match a human's ability to do the…

Artificial Intelligence · Computer Science 2018-08-14 Michaël Trazzi , Roman V. Yampolskiy

The Alignment Problem from a Deep Learning Perspective

In coming years or decades, artificial general intelligence (AGI) may surpass human capabilities across many critical domains. We argue that, without substantial effort to prevent it, AGIs could learn to pursue goals that are in conflict…

Artificial Intelligence · Computer Science 2025-05-06 Richard Ngo , Lawrence Chan , Sören Mindermann

Looking Forward: Challenges and Opportunities in Agentic AI Reliability

This chapter presents perspectives for challenges and future development in building reliable AI systems, particularly, agentic AI systems. Several open research problems related to mitigating the risks of cascading failures are discussed.…

Artificial Intelligence · Computer Science 2025-11-18 Liudong Xing , Janet , Lin

The Embeddings World and Artificial General Intelligence

From early days, a key and controversial question inside the artificial intelligence community was whether Artificial General Intelligence (AGI) is achievable. AGI is the ability of machines and computer programs to achieve human-level…

Artificial Intelligence · Computer Science 2022-09-15 Mostafa Haghir Chehreghani

The Possibility of Artificial Intelligence Becoming a Subject and the Alignment Problem

Artificial General Intelligence (AGI) is increasingly being discussed not only as a tool, but also as a potential subject with personal and therefore moral status. In our opinion, the currently dominant alignment strategies, which focus on…

Artificial Intelligence · Computer Science 2026-04-17 Till Mossakowski , Helena Esther Grass

When to Trust AI: Advances and Challenges for Certification of Neural Networks

Artificial intelligence (AI) has been advancing at a fast pace and it is now poised for deployment in a wide range of applications, such as autonomous systems, medical diagnosis and natural language processing. Early adoption of AI…

Machine Learning · Computer Science 2023-09-21 Marta Kwiatkowska , Xiyue Zhang

Corrigibility with Utility Preservation

Corrigibility is a safety property for artificially intelligent agents. A corrigible agent will not resist attempts by authorized parties to alter the goals and constraints that were encoded in the agent when it was first started. This…

Artificial Intelligence · Computer Science 2020-04-06 Koen Holtman

The Alignment Problem in Context

A core challenge in the development of increasingly capable AI systems is to make them safe and reliable by ensuring their behaviour is consistent with human values. This challenge, known as the alignment problem, does not merely apply to…

Machine Learning · Computer Science 2023-11-07 Raphaël Millière

Several Issues Regarding Data Governance in AGI

The rapid advancement of artificial intelligence has positioned data governance as a critical concern for responsible AI development. While frameworks exist for conventional AI systems, the potential emergence of Artificial General…

Computers and Society · Computer Science 2025-08-19 Masayuki Hatta

A Pathway Towards Responsible AI Generated Content

AI Generated Content (AIGC) has received tremendous attention within the past few years, with content generated in the format of image, text, audio, video, etc. Meanwhile, AIGC has become a double-edged sword and recently received much…

Artificial Intelligence · Computer Science 2023-12-29 Chen Chen , Jie Fu , Lingjuan Lyu

Concrete Problems in AI Safety

Rapid progress in machine learning and artificial intelligence (AI) has brought increasing attention to the potential impacts of AI technologies on society. In this paper we discuss one such potential impact: the problem of accidents in…

Artificial Intelligence · Computer Science 2016-07-26 Dario Amodei , Chris Olah , Jacob Steinhardt , Paul Christiano , John Schulman , Dan Mané

Safety Features for a Centralised AGI Project

Recent AI progress has outpaced expectations, with some experts now predicting AI that matches or exceeds human capabilities in all cognitive areas (AGI) could emerge this decade, potentially posing grave national and global security…

Computers and Society · Computer Science 2025-07-30 Sarah Hastings-Woodhouse

Position Paper: Bounded Alignment: What (Not) To Expect From AGI Agents

The issues of AI risk and AI safety are becoming critical as the prospect of artificial general intelligence (AGI) looms larger. The emergence of extremely large and capable generative models has led to alarming predictions and created a…

Artificial Intelligence · Computer Science 2025-05-20 Ali A. Minai

Provably safe systems: the only path to controllable AGI

We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements. We argue that this will soon be technically feasible using advanced AI…

Computers and Society · Computer Science 2023-09-06 Max Tegmark , Steve Omohundro