Consensus Sampling for Safer Generative AI

Adam Tauman Kalai; Yael Tauman Kalai; Or Zamir

Consensus Sampling for Safer Generative AI

Artificial Intelligence 2026-05-12 v2 Machine Learning

Authors: Adam Tauman Kalai , Yael Tauman Kalai , Or Zamir

Abstract

Motivated by undetectable risks in generative AI, we study a general robust aggregation problem: how to aggregate several probability distributions to boost safety. We present consensus sampling, a black-box algorithm that, given k distributions, has risk competitive with the average risk of the safest $s$ while abstaining when there is insufficient agreement. This yields an architecture-agnostic approach to generative-model safety when the distributions are induced by models that can sample and evaluate output probabilities. We formalize the guarantee through R-robustness, which also bounds information leakage and adversarial influence. Inspired by robust statistics and the provable copyright protection algorithm of Vyas et al (2023), we show that while a standard mixture is vulnerable to one unsafe constituent, a pointwise-median construction provides robust intuition, and our efficient sampler is Pareto-optimal for the tradeoff between worst-case risk and abstention. Experiments on synthetic distributions and image generation illustrate the general mechanism and its motivating safety application. The method requires overlap among safe distributions, but it provides a model-agnostic way to inherit guarantees from an unknown reliable subset.

Keywords

consensus protocol randomized algorithm adversarial robustness

Cite

@article{arxiv.2511.09493,
  title  = {Consensus Sampling for Safer Generative AI},
  author = {Adam Tauman Kalai and Yael Tauman Kalai and Or Zamir},
  journal= {arXiv preprint arXiv:2511.09493},
  year   = {2026}
}

Consensus Sampling for Safer Generative AI

Abstract

Keywords

Cite

Related papers