English
Related papers

Related papers: Model evaluation for extreme risks

200 papers

Following the rapid increase in Artificial Intelligence (AI) capabilities in recent years, the AI community has voiced concerns regarding possible safety risks. To support decision-making on the safe use and development of AI systems, there…

Machine Learning · Computer Science 2025-04-01 Gil Gekker , Meirav Segal , Dan Lahav , Omer Nevo

As AI systems appear to exhibit ever-increasing capability and generality, assessing their true potential and safety becomes paramount. This paper contends that the prevalent evaluation methods for these systems are fundamentally…

Artificial Intelligence · Computer Science 2024-07-15 John Burden

To understand the risks posed by a new AI system, we must understand what it can and cannot do. Building on prior work, we introduce a programme of new "dangerous capability" evaluations and pilot them on Gemini 1.0 models. Our evaluations…

AI evaluations are an important component of the AI governance toolkit, underlying current approaches to safety cases for preventing catastrophic risks. Our paper examines what these evaluations can and cannot tell us. Evaluations can…

Computers and Society · Computer Science 2024-12-13 Peter Barnett , Lisa Thiergart

Rapidly advancing artificial intelligence (AI) systems introduce novel, uncertain, and potentially catastrophic risks. Managing these risks requires a mature risk-management infrastructure whose cornerstone is rigorous risk modeling. We…

Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify…

Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to…

As generative large model capabilities advance, safety concerns become more pronounced in their outputs. To ensure the sustainable growth of the AI ecosystem, it's imperative to undertake a holistic evaluation and refinement of associated…

Artificial Intelligence · Computer Science 2023-12-01 Jiawen Deng , Jiale Cheng , Hao Sun , Zhexin Zhang , Minlie Huang

Advanced AI models hold the promise of tremendous benefits for humanity, but society needs to proactively manage the accompanying risks. In this paper, we focus on what we term "frontier AI" models: highly capable foundation models that…

We present a quantitative model for tracking dangerous AI capabilities over time. Our goal is to help the policy and research community visualise how dangerous capability testing can give us an early warning about approaching AI risks. We…

Artificial Intelligence · Computer Science 2024-12-23 Paolo Bova , Alessandro Di Stefano , The Anh Han

As AI systems advance, AI evaluations are becoming an important pillar of regulations for ensuring safety. We argue that such regulation should require developers to explicitly identify and justify key underlying assumptions about…

Artificial Intelligence · Computer Science 2024-11-21 Peter Barnett , Lisa Thiergart

Although general-purpose AI systems offer transformational opportunities in science and industry, they simultaneously raise critical concerns about safety, misuse, and potential loss of control. Despite these risks, methods for assessing…

Computers and Society · Computer Science 2025-12-12 Malcolm Murray , Steve Barrett , Henry Papadatos , Otter Quarks , Matt Smith , Alejandro Tlaie Boria , Chloé Touzet , Siméon Campos

In this paper we discuss how systems with Artificial Intelligence (AI) can undergo safety assessment. This is relevant, if AI is used in safety related applications. Taking a deeper look into AI models, we show, that many models of…

Artificial Intelligence · Computer Science 2021-05-17 Jens Braband , Hendrik Schäbe

Collaborative AI systems aim at working together with humans in a shared space to achieve a common goal. This setting imposes potentially hazardous circumstances due to contacts that could harm human beings. Thus, building such systems with…

Software Engineering · Computer Science 2021-03-15 Matteo Camilli , Michael Felderer , Andrea Giusti , Dominik T. Matt , Anna Perini , Barbara Russo , Angelo Susi

Frontier AI systems are being adopted across Africa, yet most AI safety evaluations are designed and validated in Western environments. In this paper, we argue that the portability gap can leave Africa-centric pathways to severe harm…

Regulation, legal liabilities, and societal concerns challenge the adoption of AI in safety and security-critical applications. One of the key concerns is that adversaries can cause harm by manipulating model predictions without being…

Machine Learning · Computer Science 2023-01-31 Jona Klemenc , Holger Trittenbach

As a result of rapidly accelerating AI capabilities, over the past year, national governments and multinational bodies have announced efforts to address safety, security and ethics issues related to AI models. One high priority among these…

Computers and Society · Computer Science 2026-05-19 Jaspreet Pannu , Doni Bloomfield , Alex Zhu , Robert MacKnight , Gabe Gomes , Anita Cicero , Thomas V. Inglesby

In recent years, artificial intelligence (AI) has deeply impacted various fields, including Earth system sciences. Here, AI improved weather forecasting, model emulation, parameter estimation, and the prediction of extreme events. However,…

Artificial intelligence (AI) systems are increasingly adopted as tool-using agents that can plan, observe their environment, and take actions over extended time periods. This evolution challenges current evaluation practices where the AI…

Cryptography and Security · Computer Science 2026-03-17 Simone Aonzo , Merve Sahin , Aurélien Francillon , Daniele Perito

A comprehensive approach to addressing catastrophic risks from AI models should cover the full model lifecycle. This paper explores contingency plans for cases where pre-deployment risk management falls short: where either very dangerous…

Computers and Society · Computer Science 2023-10-03 Joe O'Brien , Shaun Ee , Zoe Williams
‹ Prev 1 2 3 10 Next ›