Related papers: Model evaluation for extreme risks

What Makes an Evaluation Useful? Common Pitfalls and Best Practices

Following the rapid increase in Artificial Intelligence (AI) capabilities in recent years, the AI community has voiced concerns regarding possible safety risks. To support decision-making on the safe use and development of AI systems, there…

Machine Learning · Computer Science 2025-04-01 Gil Gekker , Meirav Segal , Dan Lahav , Omer Nevo

Evaluating AI Evaluation: Perils and Prospects

As AI systems appear to exhibit ever-increasing capability and generality, assessing their true potential and safety becomes paramount. This paper contends that the prevalent evaluation methods for these systems are fundamentally…

Artificial Intelligence · Computer Science 2024-07-15 John Burden

Evaluating Frontier Models for Dangerous Capabilities

To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations…

Machine Learning · Computer Science 2024-04-08 Mary Phuong , Matthew Aitchison , Elliot Catt , Sarah Cogan , Alexandre Kaskasoli , Victoria Krakovna , David Lindner , Matthew Rahtz , Yannis Assael , Sarah Hodkinson , Heidi Howard , Tom Lieberum , Ramana Kumar , Maria Abi Raad , Albert Webson , Lewis Ho , Sharon Lin , Sebastian Farquhar , Marcus Hutter , Gregoire Deletang , Anian Ruoss , Seliem El-Sayed , Sasha Brown , Anca Dragan , Rohin Shah , Allan Dafoe , Toby Shevlane

What AI evaluations for preventing catastrophic risks can and cannot do

AI evaluations are an important component of the AI governance toolkit, underlying current approaches to safety cases for preventing catastrophic risks. Our paper examines what these evaluations can and cannot tell us. Evaluations can…

Computers and Society · Computer Science 2024-12-13 Peter Barnett , Lisa Thiergart

The Role of Risk Modeling in Advanced AI Risk Management

Rapidly advancing artificial intelligence (AI) systems introduce novel, uncertain, and potentially catastrophic risks. Managing these risks requires a mature risk-management infrastructure whose cornerstone is rigorous risk modeling. We…

Computers and Society · Computer Science 2025-12-10 Chloé Touzet , Henry Papadatos , Malcolm Murray , Otter Quarks , Steve Barrett , Alejandro Tlaie Boria , Elija Perrier , Matthew Smith , Siméon Campos

Managing extreme AI risks amid rapid progress

Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify…

Computers and Society · Computer Science 2024-05-24 Yoshua Bengio , Geoffrey Hinton , Andrew Yao , Dawn Song , Pieter Abbeel , Trevor Darrell , Yuval Noah Harari , Ya-Qin Zhang , Lan Xue , Shai Shalev-Shwartz , Gillian Hadfield , Jeff Clune , Tegan Maharaj , Frank Hutter , Atılım Güneş Baydin , Sheila McIlraith , Qiqi Gao , Ashwin Acharya , David Krueger , Anca Dragan , Philip Torr , Stuart Russell , Daniel Kahneman , Jan Brauner , Sören Mindermann

Holistic Safety and Responsibility Evaluations of Advanced AI Models

Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to…

Artificial Intelligence · Computer Science 2024-04-23 Laura Weidinger , Joslyn Barnhart , Jenny Brennan , Christina Butterfield , Susie Young , Will Hawkins , Lisa Anne Hendricks , Ramona Comanescu , Oscar Chang , Mikel Rodriguez , Jennifer Beroshi , Dawn Bloxwich , Lev Proleev , Jilin Chen , Sebastian Farquhar , Lewis Ho , Iason Gabriel , Allan Dafoe , William Isaac

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

As generative large model capabilities advance, safety concerns become more pronounced in their outputs. To ensure the sustainable growth of the AI ecosystem, it's imperative to undertake a holistic evaluation and refinement of associated…

Artificial Intelligence · Computer Science 2023-12-01 Jiawen Deng , Jiale Cheng , Hao Sun , Zhexin Zhang , Minlie Huang

Frontier AI Regulation: Managing Emerging Risks to Public Safety

Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that…

Computers and Society · Computer Science 2023-11-09 Markus Anderljung , Joslyn Barnhart , Anton Korinek , Jade Leung , Cullen O'Keefe , Jess Whittlestone , Shahar Avin , Miles Brundage , Justin Bullock , Duncan Cass-Beggs , Ben Chang , Tantum Collins , Tim Fist , Gillian Hadfield , Alan Hayes , Lewis Ho , Sara Hooker , Eric Horvitz , Noam Kolt , Jonas Schuett , Yonadav Shavit , Divya Siddarth , Robert Trager , Kevin Wolf

Quantifying detection rates for dangerous capabilities: a theoretical model of dangerous capability evaluations

We present a quantitative model for tracking dangerous AI capabilities over time. Our goal is to help the policy and research community visualise how dangerous capability testing can give us an early warning about approaching AI risks. We…

Artificial Intelligence · Computer Science 2024-12-23 Paolo Bova , Alessandro Di Stefano , The Anh Han

Declare and Justify: Explicit assumptions in AI evaluations are necessary for effective regulation

As AI systems advance, AI evaluations are becoming an important pillar of regulations for ensuring safety. We argue that such regulation should require developers to explicitly identify and justify key underlying assumptions about…

Artificial Intelligence · Computer Science 2024-11-21 Peter Barnett , Lisa Thiergart

A Methodology for Quantitative AI Risk Modeling

Although general-purpose AI systems offer transformational opportunities in science and industry, they simultaneously raise critical concerns about safety, misuse, and potential loss of control. Despite these risks, methods for assessing…

Computers and Society · Computer Science 2025-12-12 Malcolm Murray , Steve Barrett , Henry Papadatos , Otter Quarks , Matt Smith , Alejandro Tlaie Boria , Chloé Touzet , Siméon Campos

On Safety Assessment of Artificial Intelligence

In this paper we discuss how systems with Artificial Intelligence (AI) can undergo safety assessment. This is relevant, if AI is used in safety related applications. Taking a deeper look into AI models, we show, that many models of…

Artificial Intelligence · Computer Science 2021-05-17 Jens Braband , Hendrik Schäbe

Towards Risk Modeling for Collaborative AI

Collaborative AI systems aim at working together with humans in a shared space to achieve a common goal. This setting imposes potentially hazardous circumstances due to contacts that could harm human beings. Thus, building such systems with…

Software Engineering · Computer Science 2021-03-15 Matteo Camilli , Michael Felderer , Andrea Giusti , Dominik T. Matt , Anna Perini , Barbara Russo , Angelo Susi

Assessing the Case for Africa-Centric AI Safety Evaluations

Frontier AI systems are being adopted across Africa, yet most AI safety evaluations are designed and validated in Western environments. In this paper, we argue that the portability gap can leave Africa-centric pathways to severe harm…

Computers and Society · Computer Science 2026-03-23 Gathoni Ireri , Cecil Abungu , Jean Cheptumo , Sienka Dounia , Mark Gitau , Stephanie Kasaon , Michael Michie , Chinasa T. Okolo , Jonathan Shock

Selecting Models based on the Risk of Damage Caused by Adversarial Attacks

Regulation, legal liabilities, and societal concerns challenge the adoption of AI in safety and security-critical applications. One of the key concerns is that adversaries can cause harm by manipulating model predictions without being…

Machine Learning · Computer Science 2023-01-31 Jona Klemenc , Holger Trittenbach

Prioritizing High-Consequence Biological Capabilities in Evaluations of Artificial Intelligence Models

As a result of rapidly accelerating AI capabilities, over the past year, national governments and multinational bodies have announced efforts to address safety, security and ethics issues related to AI models. One high priority among these…

Computers and Society · Computer Science 2026-05-19 Jaspreet Pannu , Doni Bloomfield , Alex Zhu , Robert MacKnight , Gabe Gomes , Anita Cicero , Thomas V. Inglesby

AI for Extreme Event Modeling and Understanding: Methodologies and Challenges

In recent years, artificial intelligence (AI) has deeply impacted various fields, including Earth system sciences. Here, AI improved weather forecasting, model emulation, parameter estimation, and the prediction of extreme events. However,…

Artificial Intelligence · Computer Science 2024-07-01 Gustau Camps-Valls , Miguel-Ángel Fernández-Torres , Kai-Hendrik Cohrs , Adrian Höhl , Andrea Castelletti , Aytac Pacal , Claire Robin , Francesco Martinuzzi , Ioannis Papoutsis , Ioannis Prapas , Jorge Pérez-Aracil , Katja Weigel , Maria Gonzalez-Calabuig , Markus Reichstein , Martin Rabel , Matteo Giuliani , Miguel Mahecha , Oana-Iuliana Popescu , Oscar J. Pellicer-Valero , Said Ouala , Sancho Salcedo-Sanz , Sebastian Sippel , Spyros Kondylatos , Tamara Happé , Tristan Williams

Evasive Intelligence: Lessons from Malware Analysis for Evaluating AI Agents

Artificial intelligence (AI) systems are increasingly adopted as tool-using agents that can plan, observe their environment, and take actions over extended time periods. This evolution challenges current evaluation practices where the AI…

Cryptography and Security · Computer Science 2026-03-17 Simone Aonzo , Merve Sahin , Aurélien Francillon , Daniele Perito

Deployment Corrections: An incident response framework for frontier AI models

A comprehensive approach to addressing catastrophic risks from AI models should cover the full model lifecycle. This paper explores contingency plans for cases where pre-deployment risk management falls short: where either very dangerous…

Computers and Society · Computer Science 2023-10-03 Joe O'Brien , Shaun Ee , Zoe Williams