English

Annotations Mitigate Post-Training Mode Collapse

Computation and Language 2026-05-12 v1

Abstract

Post-training (via supervised fine-tuning) improves instruction-following, but often induces semantic mode collapse by biasing models toward low-entropy fine-tuning data at the expense of the high-entropy pretraining distribution. Crucially, we find this trade-off worsens with scale. To close this semantic diversity gap, we propose annotation-anchored training, a principled method that enables models to adopt the preference-following behaviors of post-training without sacrificing the inherent diversity of pretraining. Our approach is simple: we pretrain on documents paired with semantic annotations, inducing a rich annotation distribution that reflects the full breadth of pretraining data, and we preserve this distribution during post-training. This lets us sample diverse annotations at inference time and use them as anchors to guide generation, effectively transferring pretraining's semantic richness into post-trained models. We find that models trained with annotation-anchored training can attain 6×6 \times less diversity collapse than models trained with SFT, and improve with scale.

Keywords

Cite

@article{arxiv.2605.09995,
  title  = {Annotations Mitigate Post-Training Mode Collapse},
  author = {Jacob Mitchell Springer and Madhu Advani and Lukas Aichberger and Arwen Bradley and Eran Malach and Omid Saremi and Sinead Williamson and Preetum Nakkiran and Etai Littwin and Aditi Raghunathan},
  journal= {arXiv preprint arXiv:2605.09995},
  year   = {2026}
}

Comments

21 pages, 8 figures, 11 tables. Accepted at ICML 2026