Stuart Russell — Scifaro

Transformers Provably Learn to Internalize Chain-of-Thought

Chain-of-Thought (CoT) prompting substantially improves the sample efficiency of transformers, reducing the complexity of tasks like parity learning from exponential to polynomial in the input length. However, generating explicit reasoning…

Machine Learning · Computer Science 2026-05-28 Yixiao Huang , Hanlin Zhu , Zixuan Wang , Jiantao Jiao , Stuart Russell , Somayeh Sojoudi , Song Mei

Learning the Preferences of a Learning Agent

For AI systems to be useful to humans, they must understand and act in accordance with our values and preferences. Since specifying preferences is a hard task, inverse reinforcement learning (IRL) aims to develop methods that allow for…

Artificial Intelligence · Computer Science 2026-05-12 Karim Abdel Sadek , Mark Bedaywi , Rhys Gould , Stuart Russell

Active teacher selection for reward learning

Reward learning techniques enable machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite gathering feedback from…

Artificial Intelligence · Computer Science 2026-05-11 Rachel Freedman , Justin Svegliato , Kyle Wray , Stuart Russell

Emergence of Superposition: Unveiling the Training Dynamics of Chain of Continuous Thought

Previous work shows that the chain of continuous thought (continuous CoT) improves the reasoning capability of large language models (LLMs) by enabling implicit parallel thinking, and a subsequent work provided theoretical insight by…

Machine Learning · Computer Science 2026-03-03 Hanlin Zhu , Shibo Hao , Zhiting Hu , Jiantao Jiao , Stuart Russell , Yuandong Tian

International AI Safety Report 2026

The International AI Safety Report 2026 synthesises the current scientific evidence on the capabilities, emerging risks, and safety of general-purpose AI systems. The report series was mandated by the nations attending the AI Safety Summit…

Computers and Society · Computer Science 2026-02-25 Yoshua Bengio , Stephen Clare , Carina Prunkl , Maksym Andriushchenko , Ben Bucknall , Malcolm Murray , Rishi Bommasani , Stephen Casper , Tom Davidson , Raymond Douglas , David Duvenaud , Philip Fox , Usman Gohar , Rose Hadshar , Anson Ho , Tiancheng Hu , Cameron Jones , Sayash Kapoor , Atoosa Kasirzadeh , Sam Manning , Nestor Maslej , Vasilios Mavroudis , Conor McGlynn , Richard Moulange , Jessica Newman , Kwan Yee Ng , Patricia Paskov , Shalaleh Rismani , Girish Sastry , Elizabeth Seger , Scott Singer , Charlotte Stix , Lucia Velasco , Nicole Wheeler , Daron Acemoglu , Vincent Conitzer , Thomas G. Dietterich , Fredrik Heintz , Geoffrey Hinton , Nick Jennings , Susan Leavy , Teresa Ludermir , Vidushi Marda , Helen Margetts , John McDermid , Jane Munga , Arvind Narayanan , Alondra Nelson , Clara Neppel , Sarvapali D. Ramchurn , Stuart Russell , Marietje Schaake , Bernhard Schölkopf , Alvaro Soto , Lee Tiedrich , Gaël Varoquaux , Andrew Yao , Ya-Qin Zhang , Leandro Angelo Aguirre , Olubunmi Ajala , Fahad Albalawi , Noora AlMalek , Christian Busch , Jonathan Collas , André Carlos Ponce de Leon Ferreira de Carvalho , Amandeep Gill , Ahmet Halit Hatip , Juha Heikkilä , Chris Johnson , Gill Jolly , Ziv Katzir , Mary N. Kerema , Hiroaki Kitano , Antonio Krüger , Kyoung Mu Lee , José Ramón López Portillo , Aoife McLysaght , Oleksii Molchanovskyi , Andrea Monti , Mona Nemer , Nuria Oliver , Raquel Pezoa , Audrey Plonk , Balaraman Ravindran , Hammam Riza , Crystal Rugege , Haroon Sheikh , Denise Wong , Yi Zeng , Liming Zhu , Daniel Privitera , Sören Mindermann

Statistical Guarantees for Offline Domain Randomization

Reinforcement-learning (RL) agents often struggle when deployed from simulation to the real-world. A dominant strategy for reducing the sim-to-real gap is domain randomization (DR) which trains the policy across many simulators produced by…

Machine Learning · Computer Science 2026-02-05 Arnaud Fickinger , Abderrahim Bendahi , Stuart Russell

Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers

Large language models (LLMs) can acquire new knowledge through fine-tuning, but this process exhibits a puzzling duality: models can generalize remarkably from new facts, yet are also prone to hallucinating incorrect information. However,…

Computation and Language · Computer Science 2026-02-03 Yixiao Huang , Hanlin Zhu , Tianyu Guo , Jiantao Jiao , Somayeh Sojoudi , Michael I. Jordan , Stuart Russell , Song Mei

Cross-Domain Imitation Learning via Optimal Transport

Cross-domain imitation learning studies how to leverage expert demonstrations of one agent to train an imitation agent with a different embodiment or morphology. Comparing trajectories and stationary distributions between the expert and…

Machine Learning · Computer Science 2026-01-14 Arnaud Fickinger , Samuel Cohen , Stuart Russell , Brandon Amos

Synthetic Error Injection Fails to Elicit Self-Correction In Language Models

Reinforcement learning has become the dominant paradigm for eliciting reasoning and self-correction capabilities in large language models, but its computational expense motivates exploration of alternatives. Inspired by techniques from…

Artificial Intelligence · Computer Science 2025-12-03 David X. Wu , Shreyas Kapur , Anant Sahai , Stuart Russell

International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

This second update to the 2025 International AI Safety Report assesses new developments in general-purpose AI risk management over the past year. It examines how researchers, public institutions, and AI developers are approaching risk…

Computers and Society · Computer Science 2025-11-26 Yoshua Bengio , Stephen Clare , Carina Prunkl , Maksym Andriushchenko , Ben Bucknall , Philip Fox , Nestor Maslej , Conor McGlynn , Malcolm Murray , Shalaleh Rismani , Stephen Casper , Jessica Newman , Daniel Privitera , Sören Mindermann , Daron Acemoglu , Thomas G. Dietterich , Fredrik Heintz , Geoffrey Hinton , Nick Jennings , Susan Leavy , Teresa Ludermir , Vidushi Marda , Helen Margetts , John McDermid , Jane Munga , Arvind Narayanan , Alondra Nelson , Clara Neppel , Gopal Ramchurn , Stuart Russell , Marietje Schaake , Bernhard Schölkopf , Alavaro Soto , Lee Tiedrich , Gaël Varoquaux , Andrew Yao , Ya-Qin Zhang , Leandro Aguirre , Olubunmi Ajala , Fahad Albalawi , Noora AlMalek , Christian Busch , André Carvalho , Jonathan Collas , Amandeep Gill , Ahmet Hatip , Juha Heikkilä , Chris Johnson , Gill Jolly , Ziv Katzir , Mary Kerema , Hiroaki Kitano , Antonio Krüger , Aoife McLysaght , Oleksii Molchanovskyi , Andrea Monti , Kyoung Mu Lee , Mona Nemer , Nuria Oliver , Raquel Pezoa , Audrey Plonk , José Portillo , Balaraman Ravindran , Hammam Riza , Crystal Rugege , Haroon Sheikh , Denise Wong , Yi Zeng , Liming Zhu

Robust and Diverse Multi-Agent Learning via Rational Policy Gradient

Adversarial optimization algorithms that explicitly search for flaws in agents' policies have been successfully applied to finding robust and diverse policies in multi-agent settings. However, the success of adversarial optimization has…

Artificial Intelligence · Computer Science 2025-11-13 Niklas Lauffer , Ameesh Shah , Micah Carroll , Sanjit A. Seshia , Stuart Russell , Michael Dennis

Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

Large Language Models (LLMs) have demonstrated remarkable performance in many applications, including challenging reasoning problems via chain-of-thoughts (CoTs) techniques that generate ``thinking tokens'' before answering the questions.…

Machine Learning · Computer Science 2025-11-04 Hanlin Zhu , Shibo Hao , Zhiting Hu , Jiantao Jiao , Stuart Russell , Yuandong Tian

International AI Safety Report 2025: First Key Update: Capabilities and Risk Implications

Since the publication of the first International AI Safety Report, AI capabilities have continued to improve across key domains. New training techniques that teach AI systems to reason step-by-step and inference-time enhancements have…

Computers and Society · Computer Science 2025-10-16 Yoshua Bengio , Stephen Clare , Carina Prunkl , Shalaleh Rismani , Maksym Andriushchenko , Ben Bucknall , Philip Fox , Tiancheng Hu , Cameron Jones , Sam Manning , Nestor Maslej , Vasilios Mavroudis , Conor McGlynn , Malcolm Murray , Charlotte Stix , Lucia Velasco , Nicole Wheeler , Daniel Privitera , Sören Mindermann , Daron Acemoglu , Thomas G. Dietterich , Fredrik Heintz , Geoffrey Hinton , Nick Jennings , Susan Leavy , Teresa Ludermir , Vidushi Marda , Helen Margetts , John McDermid , Jane Munga , Arvind Narayanan , Alondra Nelson , Clara Neppel , Gopal Ramchurn , Stuart Russell , Marietje Schaake , Bernhard Schölkopf , Alavaro Soto , Lee Tiedrich , Gaël Varoquaux , Andrew Yao , Ya-Qin Zhang , Leandro Aguirre , Olubunmi Ajala , Fahad Albalawi Noora AlMalek , Christian Busch , André Carvalho , Jonathan Collas , Amandeep Gill , Ahmet Hatip , Juha Heikkilä , Chris Johnson , Gill Jolly , Ziv Katzir , Mary Kerema , Hiroaki Kitano , Antonio Krüger , Aoife McLysaght , Oleksii Molchanovskyi , Andrea Monti , Kyoung Mu Lee , Mona Nemer , Nuria Oliver , Raquel Pezoa , Audrey Plonk , José Portillo , Balaraman Ravindran , Hammam Riza , Crystal Rugege , Haroon Sheikh , Denise Wong , Yi Zeng , Liming Zhu

GSM-Agent: Understanding Agentic Reasoning Using Controllable Environments

As LLMs are increasingly deployed as agents, agentic reasoning - the ability to combine tool use, especially search, and reasoning - becomes a critical skill. However, it is hard to disentangle agentic reasoning when evaluated in complex…

Artificial Intelligence · Computer Science 2025-10-03 Hanlin Zhu , Tianyu Guo , Song Mei , Stuart Russell , Nikhil Ghosh , Alberto Bietti , Jiantao Jiao

Forecasting Seismic Waveforms: A Deep Learning Approach for Einstein Telescope

We introduce \textit{SeismoGPT}, a transformer-based model for forecasting three-component seismic waveforms in the context of future gravitational wave detectors like the Einstein Telescope. The model is trained in an autoregressive…

Machine Learning · Computer Science 2025-09-29 Waleed Esmail , Alexander Kappes , Stuart Russell , Christine Thomas

Safe Learning Under Irreversible Dynamics via Asking for Help

Most learning algorithms with formal regret guarantees essentially rely on trying all possible behaviors, which is problematic when some errors cannot be recovered from. Instead, we allow the learning agent to ask for help from a mentor and…

Machine Learning · Computer Science 2025-09-17 Benjamin Plaut , Juan Liévano-Karim , Hanlin Zhu , Stuart Russell

Observation Interference in Partially Observable Assistance Games

We study partially observable assistance games (POAGs), a model of the human-AI value alignment problem which allows the human and the AI assistant to have partial observations. Motivated by concerns of AI deception, we study a…

Artificial Intelligence · Computer Science 2025-08-12 Scott Emmons , Caspar Oesterheld , Vincent Conitzer , Stuart Russell

Avoiding Catastrophe in Online Learning by Asking for Help

Most learning algorithms with formal regret guarantees assume that all mistakes are recoverable and essentially rely on trying all possible behaviors. This approach is problematic when some mistakes are "catastrophic", i.e., irreparable. We…

Machine Learning · Computer Science 2025-08-07 Benjamin Plaut , Hanlin Zhu , Stuart Russell

The Singapore Consensus on Global AI Safety Research Priorities

Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is…

Artificial Intelligence · Computer Science 2025-07-02 Yoshua Bengio , Tegan Maharaj , Luke Ong , Stuart Russell , Dawn Song , Max Tegmark , Lan Xue , Ya-Qin Zhang , Stephen Casper , Wan Sie Lee , Sören Mindermann , Vanessa Wilfred , Vidhisha Balachandran , Fazl Barez , Michael Belinsky , Imane Bello , Malo Bourgon , Mark Brakel , Siméon Campos , Duncan Cass-Beggs , Jiahao Chen , Rumman Chowdhury , Kuan Chua Seah , Jeff Clune , Juntao Dai , Agnes Delaborde , Nouha Dziri , Francisco Eiras , Joshua Engels , Jinyu Fan , Adam Gleave , Noah Goodman , Fynn Heide , Johannes Heidecke , Dan Hendrycks , Cyrus Hodes , Bryan Low Kian Hsiang , Minlie Huang , Sami Jawhar , Wang Jingyu , Adam Tauman Kalai , Meindert Kamphuis , Mohan Kankanhalli , Subhash Kantamneni , Mathias Bonde Kirk , Thomas Kwa , Jeffrey Ladish , Kwok-Yan Lam , Wan Lee Sie , Taewhi Lee , Xiaojian Li , Jiajun Liu , Chaochao Lu , Yifan Mai , Richard Mallah , Julian Michael , Nick Moës , Simon Möller , Kihyuk Nam , Kwan Yee Ng , Mark Nitzberg , Besmira Nushi , Seán O hÉigeartaigh , Alejandro Ortega , Pierre Peigné , James Petrie , Benjamin Prud'Homme , Reihaneh Rabbany , Nayat Sanchez-Pi , Sarah Schwettmann , Buck Shlegeris , Saad Siddiqui , Aradhana Sinha , Martín Soto , Cheston Tan , Dong Ting , William Tjhi , Robert Trager , Brian Tse , Anthony Tung K. H. , Vanessa Wilfred , John Willes , Denise Wong , Wei Xu , Rongwu Xu , Yi Zeng , HongJiang Zhang , Djordje Žikelić

AssistanceZero: Scalably Solving Assistance Games

Assistance games are a promising alternative to reinforcement learning from human feedback (RLHF) for training AI assistants. Assistance games resolve key drawbacks of RLHF, such as incentives for deceptive behavior, by explicitly modeling…

Artificial Intelligence · Computer Science 2025-06-13 Cassidy Laidlaw , Eli Bronstein , Timothy Guo , Dylan Feng , Lukas Berglund , Justin Svegliato , Stuart Russell , Anca Dragan