Related papers: Safety without alignment

Positive Alignment: Artificial Intelligence for Human Flourishing

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete.…

Artificial Intelligence · Computer Science 2026-05-15 Ruben Laukkonen , Seb Krier , Chloé Bakalar , Shamil Chandaria , Morten Kringelbach , Adam Elwood , Daniel Ford , Fernando Rosas , Maty Bohacek , Matija Franklin , Nenad Tomašev , Stephanie Chan , Verena Rieser , Roma Patel , Michael Levin , Arun Rao

Safe AI -- How is this Possible?

Ttraditional safety engineering is coming to a turning point moving from deterministic, non-evolving systems operating in well-defined contexts to increasingly autonomous and learning-enabled AI systems which are acting in largely…

Artificial Intelligence · Computer Science 2022-05-13 Harald Rueß , Simon Burton

\texttt{R$^\textbf{2}$AI}: Towards Resistant and Resilient AI in an Evolving World

In this position paper, we address the persistent gap between rapidly growing AI capabilities and lagging safety progress. Existing paradigms divide into ``Make AI Safe'', which applies post-hoc alignment and guardrails but remains brittle…

Machine Learning · Computer Science 2025-09-09 Youbang Sun , Xiang Wang , Jie Fu , Chaochao Lu , Bowen Zhou

AI safety: state of the field through quantitative lens

Last decade has seen major improvements in the performance of artificial intelligence which has driven wide-spread applications. Unforeseen effects of such mass-adoption has put the notion of AI safety into the public eye. AI safety is a…

Computers and Society · Computer Science 2020-07-10 Mislav Juric , Agneza Sandic , Mario Brcic

Mind the Gap! Pathways Towards Unifying AI Safety and Ethics Research

While much research in artificial intelligence (AI) has focused on scaling capabilities, the accelerating pace of development makes countervailing work on producing harmless, "aligned" systems increasingly urgent. Yet research on alignment…

Artificial Intelligence · Computer Science 2025-12-12 Dani Roytburg , Beck Miller

Human-AI Safety: A Descendant of Generative AI and Control Systems Safety

Artificial intelligence (AI) is interacting with people at an unprecedented scale, offering new avenues for immense positive impact, but also raising widespread concerns around the potential for individual and societal harm. Today, the…

Artificial Intelligence · Computer Science 2024-06-25 Andrea Bajcsy , Jaime F. Fisac

Combining Theory of Mind and Kindness for Self-Supervised Human-AI Alignment

As artificial intelligence (AI) becomes deeply integrated into critical infrastructures and everyday life, ensuring its safe deployment is one of humanity's most urgent challenges. Current AI models prioritize task optimization over safety,…

Artificial Intelligence · Computer Science 2024-11-08 Joshua T. S. Hewson

Ethical Artificial Intelligence - An Open Question

Artificial Intelligence (AI) is an effective science which employs strong enough approaches, methods, and techniques to solve unsolvable real world based problems. Because of its unstoppable rise towards the future, there are also some…

Artificial Intelligence · Computer Science 2017-06-12 Alice Pavaloiu , Utku Kose

Position Paper: Bounded Alignment: What (Not) To Expect From AGI Agents

The issues of AI risk and AI safety are becoming critical as the prospect of artificial general intelligence (AGI) looms larger. The emergence of extremely large and capable generative models has led to alarming predictions and created a…

Artificial Intelligence · Computer Science 2025-05-20 Ali A. Minai

Artificial Intelligence, Values and Alignment

This paper looks at philosophical questions that arise in the context of AI alignment. It defends three propositions. First, normative and technical aspects of the AI alignment problem are interrelated, creating space for productive…

Computers and Society · Computer Science 2020-10-07 Iason Gabriel

The Challenge of Value Alignment: from Fairer Algorithms to AI Safety

This paper addresses the question of how to align AI systems with human values and situates it within a wider body of thought regarding technology and value. Far from existing in a vacuum, there has long been an interest in the ability of…

Computers and Society · Computer Science 2021-01-19 Iason Gabriel , Vafa Ghazavi

Understanding and Avoiding AI Failures: A Practical Guide

As AI technologies increase in capability and ubiquity, AI accidents are becoming more common. Based on normal accident theory, high reliability theory, and open systems theory, we create a framework for understanding the risks associated…

Computers and Society · Computer Science 2024-03-13 Heather M. Williams , Roman V. Yampolskiy

AI Safety for Everyone

Recent discussions and research in AI safety have increasingly emphasized the deep connection between AI safety and existential risk from advanced AI systems, suggesting that work on AI safety necessarily entails serious consideration of…

Computers and Society · Computer Science 2025-02-17 Balint Gyevnar , Atoosa Kasirzadeh

Hybrid Approaches for Moral Value Alignment in AI Agents: a Manifesto

Increasing interest in ensuring the safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. This goal differs qualitatively from traditional task-specific AI…

Artificial Intelligence · Computer Science 2025-01-17 Elizaveta Tennant , Stephen Hailes , Mirco Musolesi

Standardization Trends on Safety and Trustworthiness Technology for Advanced AI

Artificial Intelligence (AI) has rapidly evolved over the past decade and has advanced in areas such as language comprehension, image and video recognition, programming, and scientific reasoning. Recent AI technologies based on large…

Machine Learning · Computer Science 2024-10-30 Jonghong Jeon

A New Perspective On AI Safety Through Control Theory Methodologies

While artificial intelligence (AI) is advancing rapidly and mastering increasingly complex problems with astonishing performance, the safety assurance of such systems is a major concern. Particularly in the context of safety-critical,…

Artificial Intelligence · Computer Science 2025-07-01 Lars Ullrich , Walter Zimmer , Ross Greer , Knut Graichen , Alois C. Knoll , Mohan Trivedi

Towards provable probabilistic safety for scalable embodied AI systems

Embodied AI systems, comprising AI models and physical plants, are increasingly prevalent across various applications. Due to the rarity of system failures, ensuring their safety in complex operating environments remains a major challenge,…

Systems and Control · Electrical Eng. & Systems 2026-04-09 Linxuan He , Lingxiang Fan , Qing-Shan Jia , Ang Li , Hongyan Sang , Ling Wang , Guanghui Wen , Jiwen Lu , Tao Zhang , Jie Zhou , Yi Zhang , Yisen Wang , Peng Wei , Zhongyuan Wang , Henry X. Liu , Shuo Feng

Position: AI Safety Requires Effective Controllability

AI safety is still largely framed as alignment: training models to follow human preferences, safety policies, and normative constraints. That framing has improved the behavior of modern language models, but aligned behavior does not by…

Artificial Intelligence · Computer Science 2026-05-27 Yige Li , Yunhao Feng , Jun Sun

Safety Cases: How to Justify the Safety of Advanced AI Systems

As AI systems become more advanced, companies and regulators will make difficult decisions about whether it is safe to train and deploy them. To prepare for these decisions, we investigate how developers could make a 'safety case,' which is…

Computers and Society · Computer Science 2024-03-20 Joshua Clymer , Nick Gabrieli , David Krueger , Thomas Larsen

The BIG Argument for AI Safety Cases

We present our Balanced, Integrated and Grounded (BIG) argument for assuring the safety of AI systems. The BIG argument adopts a whole-system approach to constructing a safety case for AI systems of varying capability, autonomy and…

Computers and Society · Computer Science 2025-04-01 Ibrahim Habli , Richard Hawkins , Colin Paterson , Philippa Ryan , Yan Jia , Mark Sujan , John McDermid