Related papers: Calibrating Sequence likelihood Improves Condition…

Calibrating Likelihoods towards Consistency in Summarization Models

Despite the recent advances in abstractive text summarization, current summarization models still suffer from generating factually inconsistent summaries, reducing their utility for real-world application. We argue that the main reason for…

Computation and Language · Computer Science 2023-10-16 Polina Zablotskaia , Misha Khalman , Rishabh Joshi , Livio Baldini Soares , Shoshana Jakobovits , Joshua Maynez , Shashi Narayan

Contextualized Sequence Likelihood: Enhanced Confidence Scores for Natural Language Generation

The advent of large language models (LLMs) has dramatically advanced the state-of-the-art in numerous natural language generation tasks. For LLMs to be applied reliably, it is essential to have an accurate measure of their confidence.…

Computation and Language · Computer Science 2024-06-05 Zhen Lin , Shubhendu Trivedi , Jimeng Sun

Boosting In-Context Learning in LLMs Through the Lens of Classical Supervised Learning

In-Context Learning (ICL) allows Large Language Models (LLMs) to adapt to new tasks with just a few examples, but their predictions often suffer from systematic biases, leading to unstable performance in classification. While calibration…

Machine Learning · Statistics 2026-03-05 Korel Gundem , Juncheng Dong , Dennis Zhang , Vahid Tarokh , Zhengling Qi

Enhancing In-context Learning via Linear Probe Calibration

In-context learning (ICL) is a new paradigm for natural language processing that utilizes Generative Pre-trained Transformer (GPT)-like models. This approach uses prompts that include in-context demonstrations to generate the corresponding…

Computation and Language · Computer Science 2024-01-24 Momin Abbas , Yi Zhou , Parikshit Ram , Nathalie Baracaldo , Horst Samulowitz , Theodoros Salonidis , Tianyi Chen

Implicit Unlikelihood Training: Improving Neural Text Generation with Reinforcement Learning

Likelihood training and maximization-based decoding result in dull and repetitive generated texts even when using powerful language models (Holtzman et al., 2019). Adding a loss function for regularization was shown to improve text…

Computation and Language · Computer Science 2021-01-13 Evgeny Lagutin , Daniil Gavrilov , Pavel Kalaidin

Comparison of Diverse Decoding Methods from Conditional Language Models

While conditional language models have greatly improved in their ability to output high-quality natural language, many NLP applications benefit from being able to generate a diverse set of candidate sequences. Diverse decoding strategies…

Computation and Language · Computer Science 2019-06-18 Daphne Ippolito , Reno Kriz , Maria Kustikova , João Sedoc , Chris Callison-Burch

Neural Text Generation with Unlikelihood Training

Neural text generation is a key tool in natural language applications, but it is well known there are major problems at its core. In particular, standard likelihood training and decoding leads to dull and repetitive outputs. While some…

Machine Learning · Computer Science 2019-09-30 Sean Welleck , Ilia Kulikov , Stephen Roller , Emily Dinan , Kyunghyun Cho , Jason Weston

Sentence-wise Smooth Regularization for Sequence to Sequence Learning

Maximum-likelihood estimation (MLE) is widely used in sequence to sequence tasks for model training. It uniformly treats the generation/prediction of each target token as multi-class classification, and yields non-smooth prediction…

Computation and Language · Computer Science 2018-12-13 Chengyue Gong , Xu Tan , Di He , Tao Qin

A Study on the Calibration of In-context Learning

Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs). We study in-context learning (ICL), a…

Computation and Language · Computer Science 2024-03-29 Hanlin Zhang , Yi-Fan Zhang , Yaodong Yu , Dhruv Madeka , Dean Foster , Eric Xing , Himabindu Lakkaraju , Sham Kakade

Temporal Probability Calibration

In many applications, accurate class probability estimates are required, but many types of models produce poor quality probability estimates despite achieving acceptable classification accuracy. Even though probability calibration has been…

Machine Learning · Computer Science 2020-02-18 Tim Leathart , Maksymilian Polaczuk

Surprise Calibration for Better In-Context Learning

In-context learning (ICL) has emerged as a powerful paradigm for task adaptation in large language models (LLMs), where models infer underlying task structures from a few demonstrations. However, ICL remains susceptible to biases that arise…

Computation and Language · Computer Science 2025-06-18 Zhihang Tan , Jingrui Hou , Ping Wang , Qibiao Hu , Peng Zhu

Learning to Search Effective Example Sequences for In-Context Learning

Large language models (LLMs) demonstrate impressive few-shot learning capabilities, but their performance varies widely based on the sequence of in-context examples. Key factors influencing this include the sequence's length, composition,…

Computation and Language · Computer Science 2025-03-12 Xiang Gao , Ankita Sinha , Kamalika Das

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of…

Computation and Language · Computer Science 2023-10-25 Katherine Tian , Eric Mitchell , Allan Zhou , Archit Sharma , Rafael Rafailov , Huaxiu Yao , Chelsea Finn , Christopher D. Manning

Multicalibration for LLM-based Code Generation

As AI-based code generation becomes widespread, researchers are investigating the calibration of code LLMs - ensuring their confidence scores faithfully represent the true likelihood of code correctness. To do so, we investigate…

Software Engineering · Computer Science 2025-12-10 Viola Campos , Robin Kuschnereit , Adrian Ulges

Language Generation with Strictly Proper Scoring Rules

Language generation based on maximum likelihood estimation (MLE) has become the fundamental approach for text generation. Maximum likelihood estimation is typically performed by minimizing the log-likelihood loss, also known as the…

Computation and Language · Computer Science 2024-05-30 Chenze Shao , Fandong Meng , Yijin Liu , Jie Zhou

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Auto-regressive sequence generative models trained by Maximum Likelihood Estimation suffer the exposure bias problem in practical finite sample scenarios. The crux is that the number of training samples for Maximum Likelihood Estimation is…

Machine Learning · Statistics 2020-07-14 Yuxuan Song , Ning Miao , Hao Zhou , Lantao Yu , Mingxuan Wang , Lei Li

SGIC: A Self-Guided Iterative Calibration Framework for RAG

Recent research in retrieval-augmented generation (RAG) has concentrated on retrieving useful information from candidate documents. However, numerous methodologies frequently neglect the calibration capabilities of large language models…

Computation and Language · Computer Science 2025-06-23 Guanhua Chen , Yutong Yao , Lidia S. Chao , Xuebo Liu , Derek F. Wong

Likelihood-based Mitigation of Evaluation Bias in Large Language Models

Large Language Models (LLMs) are widely used to evaluate natural language generation tasks as automated metrics. However, the likelihood, a measure of LLM's plausibility for a sentence, can vary due to superficial differences in sentences,…

Computation and Language · Computer Science 2025-11-11 Masanari Oi , Masahiro Kaneko , Ryuto Koike , Mengsay Loem , Naoaki Okazaki

Self-Calibrating Language Models via Test-Time Discriminative Distillation

Large language models (LLMs) are systematically overconfident: they routinely express high certainty on questions they often answer incorrectly. Existing calibration methods either require labeled validation data, degrade under distribution…

Computation and Language · Computer Science 2026-04-14 Mohamed Rissal Hedna , Jan Strich , Martin Semmann , Chris Biemann

Generative Calibration for In-context Learning

As one of the most exciting features of large language models (LLMs), in-context learning is a mixed blessing. While it allows users to fast-prototype a task solver with only a few training examples, the performance is generally sensitive…

Computation and Language · Computer Science 2023-10-17 Zhongtao Jiang , Yuanzhe Zhang , Cao Liu , Jun Zhao , Kang Liu