Related papers: Alignment Problems With Current Forecasting Platfo…

Goal Alignment: A Human-Aware Account of Value Alignment Problem

Value alignment problems arise in scenarios where the specified objectives of an AI agent don't match the true underlying objective of its users. The problem has been widely argued to be one of the central safety problems in AI.…

Artificial Intelligence · Computer Science 2023-02-10 Malek Mechergui , Sarath Sreedharan

Emergent Alignment via Competition

Aligning AI systems with human values remains a fundamental challenge, but does our inability to create perfectly aligned models preclude obtaining the benefits of alignment? We study a strategic setting where a human user interacts with…

Machine Learning · Computer Science 2026-02-04 Natalie Collina , Surbhi Goel , Aaron Roth , Emily Ryu , Mirah Shi

Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks

This paper leverages insights from Alignment Theory (AT) research, which primarily focuses on the potential pitfalls of technical alignment in Artificial Intelligence, to critically examine the European Union's Artificial Intelligence Act…

Computers and Society · Computer Science 2024-10-29 Alejandro Tlaie

The Alignment Problem in Context

A core challenge in the development of increasingly capable AI systems is to make them safe and reliable by ensuring their behaviour is consistent with human values. This challenge, known as the alignment problem, does not merely apply to…

Machine Learning · Computer Science 2023-11-07 Raphaël Millière

Incentive-Compatible Forecasting Competitions

We initiate the study of incentive-compatible forecasting competitions in which multiple forecasters make predictions about one or more events and compete for a single prize. We have two objectives: (1) to incentivize forecasters to report…

Computer Science and Game Theory · Computer Science 2021-09-09 Jens Witkowski , Rupert Freeman , Jennifer Wortman Vaughan , David M. Pennock , Andreas Krause

Scalable agent alignment via reward modeling: a research direction

One obstacle to applying reinforcement learning algorithms to real-world problems is the lack of suitable reward functions. Designing such reward functions is difficult in part because the user only has an implicit understanding of the task…

Machine Learning · Computer Science 2018-11-20 Jan Leike , David Krueger , Tom Everitt , Miljan Martic , Vishal Maini , Shane Legg

A Game-Theoretic Framework for Joint Forecasting and Planning

Planning safe robot motions in the presence of humans requires reliable forecasts of future human motion. However, simply predicting the most likely motion from prior interactions does not guarantee safety. Such forecasts fail to model the…

Artificial Intelligence · Computer Science 2023-10-23 Kushal Kedia , Prithwish Dan , Sanjiban Choudhury

GTAlign: Game-Theoretic Alignment of LLM Assistants for Social Welfare

Large Language Models (LLMs) have achieved remarkable progress in reasoning, yet sometimes produce responses that are suboptimal for users in tasks such as writing, information seeking, or providing practical guidance. Conventional…

Artificial Intelligence · Computer Science 2025-11-04 Siqi Zhu , David Zhang , Pedro Cisneros-Velarde , Jiaxuan You

The Wisdom of Deliberating AI Crowds: Does Deliberation Improve LLM-Based Forecasting?

Structured deliberation has been found to improve the performance of human forecasters. This study investigates whether a similar intervention, i.e. allowing LLMs to review each other's forecasts before updating, can improve accuracy in…

Artificial Intelligence · Computer Science 2025-12-30 Paul Schneider , Amalie Schramm

Forecast Evaluation and the Relationship of Regret and Calibration

Machine learning is about forecasting. When the forecasts come with an evaluation metric the forecasts become useful. What are reasonable evaluation metrics? How do existing evaluation metrics relate? In this work, we provide a general…

Machine Learning · Computer Science 2025-07-08 Rabanus Derr , Robert C. Williamson

Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment

Existing work on the alignment problem has focused mainly on (1) qualitative descriptions of the alignment problem; (2) attempting to align AI actions with human interests by focusing on value specification and learning; and/or (3) focusing…

Multiagent Systems · Computer Science 2025-06-03 Aidan Kierans , Avijit Ghosh , Hananel Hazan , Shiri Dori-Hacohen

AI Alignment Breaks at the Edge

General Alignment has improved average-case helpfulness and safety, but current alignment practice still rewards confident, single-turn responses. The problem is not only that models fail on edge cases; it is that current evaluation makes…

Computation and Language · Computer Science 2026-05-19 Han Bao , Yue Huang , Xiaoda Wang , Zheyuan Zhang , Yujun Zhou , Carl Yang , Xiangliang Zhang , Yanfang Ye

Forecasting Competitions with Correlated Events

Beginning with Witkowski et al. [2022], recent work on forecasting competitions has addressed incentive problems with the common winner-take-all mechanism. Frongillo et al. [2021] propose a competition mechanism based on…

Machine Learning · Computer Science 2023-03-27 Rafael Frongillo , Manuel Lladser , Anish Thilagar , Bo Waggoner

Conditional Forecasts and Proper Scoring Rules for Reliable and Accurate Performative Predictions

Performative predictions are forecasts which influence the outcomes they aim to predict, undermining the existence of correct forecasts and standard methods of elicitation and estimation. We show that conditioning forecasts on covariates…

Statistics Theory · Mathematics 2025-10-27 Philip Boeken , Onno Zoeter , Joris M. Mooij

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Offering a promising solution to the scalability challenges associated with human evaluation, the LLM-as-a-judge paradigm is rapidly gaining traction as an approach to evaluating large language models (LLMs). However, there are still many…

Computation and Language · Computer Science 2025-08-19 Aman Singh Thakur , Kartik Choudhary , Venkat Srinik Ramayapally , Sankaran Vaidyanathan , Dieuwke Hupkes

Can LLMs Estimate Student Struggles? Human-AI Difficulty Alignment with Proficiency Simulation for Item Difficulty Prediction

Accurate estimation of item (question or task) difficulty is critical for educational assessment but suffers from the cold start problem. While Large Language Models demonstrate superhuman problem-solving capabilities, it remains an open…

Computation and Language · Computer Science 2026-05-12 Ming Li , Han Chen , Yunze Xiao , Jian Chen , Hong Jiao , Tianyi Zhou

Facilitating Matches on Allocation Platforms

We consider a setting where goods are allocated to agents by way of an allocation platform (e.g., a matching platform). An ``allocation facilitator'' aims to increase the overall utility/social-good of the allocation by encouraging (some of…

Computer Science and Game Theory · Computer Science 2025-08-27 Yohai Trabelsi , Abhijin Adiga , Yonatan Aumann , Sarit Kraus , S. S. Ravi

Aligned with Whom? Direct and social goals for AI systems

As artificial intelligence (AI) becomes more powerful and widespread, the AI alignment problem - how to ensure that AI systems pursue the goals that we want them to pursue - has garnered growing attention. This article distinguishes two…

Computers and Society · Computer Science 2022-05-10 Anton Korinek , Avital Balwit

Algorithms with Prediction Portfolios

The research area of algorithms with predictions has seen recent success showing how to incorporate machine learning into algorithm design to improve performance when the predictions are correct, while retaining worst-case guarantees when…

Machine Learning · Computer Science 2022-12-06 Michael Dinitz , Sungjin Im , Thomas Lavastida , Benjamin Moseley , Sergei Vassilvitskii

Interpolating Item and User Fairness in Multi-Sided Recommendations

Today's online platforms heavily lean on algorithmic recommendations for bolstering user engagement and driving revenue. However, these recommendations can impact multiple stakeholders simultaneously -- the platform, items (sellers), and…

Information Retrieval · Computer Science 2024-05-28 Qinyi Chen , Jason Cheuk Nam Liang , Negin Golrezaei , Djallel Bouneffouf