English
Related papers

Related papers: Calibrating Large Language Models with Sample Cons…

200 papers

Large language models (LLMs) have demonstrated remarkable capabilities across a wide range of tasks in various domains. Despite their impressive performance, they can be unreliable due to factual errors in their generations. Assessing their…

Computation and Language · Computer Science 2024-03-26 Jiahui Geng , Fengyu Cai , Yuxia Wang , Heinz Koeppl , Preslav Nakov , Iryna Gurevych

As large language models attract increasing attention and find widespread application, concurrent challenges of reliability also arise at the same time. Confidence calibration, an effective analysis method for gauging the reliability of…

Computation and Language · Computer Science 2023-11-23 Chiwei Zhu , Benfeng Xu , Quan Wang , Yongdong Zhang , Zhendong Mao

To enhance Large Language Models' (LLMs) reliability, calibration is essential -- the model's assessed confidence scores should align with the actual likelihood of its responses being correct. However, current confidence elicitation methods…

Computation and Language · Computer Science 2024-10-29 Yukun Huang , Yixin Liu , Raghuveer Thirukovalluru , Arman Cohan , Bhuwan Dhingra

Pre-trained language models (PLMs) may fail in giving reliable estimates of their predictive uncertainty. We take a close look into this problem, aiming to answer two questions: (1) Do PLMs learn to become calibrated in the training…

Computation and Language · Computer Science 2023-05-09 Yangyi Chen , Lifan Yuan , Ganqu Cui , Zhiyuan Liu , Heng Ji

We investigate the calibration of large language models' (LLMs') confidence across diverse tasks. The results of our preregistered study show that the current crop of LLMs are, like people, too sure they are right: confidence exceeds…

Artificial Intelligence · Computer Science 2026-05-26 Noam Michael , Daniel BenShushan , Jacob Bien , Don A. Moore

Large language models (LLMs) are widely deployed as general-purpose problem solvers, making accurate confidence estimation critical for reliable use. Prior work on LLM calibration largely focuses on response-level confidence, which…

Computation and Language · Computer Science 2026-02-17 Sin-Han Yang , Cheng-Kuang Wu , Chieh-Yen Lin , Yun-Nung Chen , Hung-yi Lee , Shao-Hua Sun

Large Language Models (LLMs) that can express interpretable and calibrated uncertainty are crucial in high-stakes domains. While methods to compute uncertainty post-hoc exist, they are often sampling-based and therefore computationally…

Machine Learning · Computer Science 2026-03-09 Azza Jenane , Nassim Walha , Lukas Kuhn , Florian Buettner

A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, enabling deferral to an expert in cases of…

Computation and Language · Computer Science 2023-10-25 Katherine Tian , Eric Mitchell , Allan Zhou , Archit Sharma , Rafael Rafailov , Huaxiu Yao , Chelsea Finn , Christopher D. Manning

Large Language Models (LLMs) show promise for automated grading, but their outputs can be unreliable. Rather than improving grading accuracy directly, we address a complementary problem: \textit{predicting when an LLM grader is likely to be…

Computation and Language · Computer Science 2026-04-01 Robinson Ferrer , Damla Turgut , Zhongzhou Chen , Shashank Sonkar

Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not…

Computation and Language · Computer Science 2024-10-10 Mozhi Zhang , Mianqiu Huang , Rundong Shi , Linsen Guo , Chong Peng , Peng Yan , Yaqian Zhou , Xipeng Qiu

Large Language Models (LLMs) show remarkable proficiency in natural language tasks, yet their frequent overconfidence-misalignment between predicted confidence and true correctness-poses significant risks in critical decision-making…

Computation and Language · Computer Science 2025-12-15 Prateek Chhikara

Large Language Models (LLMs) have demonstrated exceptional capabilities, yet selecting the most reliable response from multiple LLMs remains a challenge, particularly in resource-constrained settings. Existing approaches often depend on…

Computation and Language · Computer Science 2025-10-06 Aakriti Agrawal , Rohith Aralikatti , Anirudh Satheesh , Souradip Chakraborty , Amrit Singh Bedi , Furong Huang

Large Language Models (LLMs) exhibit remarkable fluency and competence across various natural language tasks. However, recent research has highlighted their sensitivity to variations in input prompts. To deploy LLMs in a safe and reliable…

Computation and Language · Computer Science 2025-04-30 Harsh Raj , Vipul Gupta , Domenic Rosati , Subhabrata Majumdar

Multilingual pre-trained Large Language Models (LLMs) are incredibly effective at Question Answering (QA), a core task in Natural Language Understanding, achieving high accuracies on several multilingual benchmarks. However, little is known…

Computation and Language · Computer Science 2024-04-16 Yahan Yang , Soham Dan , Dan Roth , Insup Lee

There is a growing literature on reasoning by large language models (LLMs), but the discussion on the uncertainty in their responses is still lacking. Our aim is to assess the extent of confidence that LLMs have in their answers and how it…

Computation and Language · Computer Science 2024-12-23 Yudi Pawitan , Chris Holmes

While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when…

Computation and Language · Computer Science 2023-04-13 Harsh Raj , Domenic Rosati , Subhabrata Majumdar

Uncertainty estimation is a significant issue for current large language models (LLMs) that are generally poorly calibrated and over-confident, especially with reinforcement learning from human feedback (RLHF). Unlike humans, whose…

Computation and Language · Computer Science 2024-05-13 Ruixin Yang , Dheeraj Rajagopal , Shirley Anugrah Hayati , Bin Hu , Dongyeop Kang

Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks, aided by techniques like chain-of-thought prompting that elicits verbalized reasoning. However, LLMs often generate text with obvious…

Artificial Intelligence · Computer Science 2024-12-06 Zhihui Xie , Jizhou Guo , Tong Yu , Shuai Li

Large Language Models (LLMs) have demonstrated remarkable self-improvement capabilities, whereby models iteratively revise their outputs through self-generated feedback. While this reflective mechanism has shown promise in enhancing task…

Computation and Language · Computer Science 2025-04-07 Liangjie Huang , Dawei Li , Huan Liu , Lu Cheng

Large language models (LLMs) are increasingly used in high-stakes settings, where overconfident responses can mislead users. Reliable confidence estimation has been shown to enhance trust and task accuracy. Yet existing methods face…

Computation and Language · Computer Science 2025-09-30 Linwei Tao , Yi-Fan Yeh , Bo Kai , Minjing Dong , Tao Huang , Tom A. Lamb , Jialin Yu , Philip H. S. Torr , Chang Xu
‹ Prev 1 2 3 10 Next ›