Related papers: Correct after Answer: Enhancing Multi-Span Questio…

Answer, Assemble, Ace: Understanding How LMs Answer Multiple Choice Questions

Multiple-choice question answering (MCQA) is a key competence of performant transformer language models that is tested by mainstream benchmarks. However, recent evidence shows that models can have quite a range of performance, particularly…

Computation and Language · Computer Science 2025-03-11 Sarah Wiegreffe , Oyvind Tafjord , Yonatan Belinkov , Hannaneh Hajishirzi , Ashish Sabharwal

Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering

One of the most widely used tasks for evaluating Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA). While open-ended question answering tasks are more challenging to evaluate, MCQA tasks are, in principle, easier to…

Computation and Language · Computer Science 2025-06-10 Francesco Maria Molfese , Luca Moroni , Luca Gioffré , Alessandro Scirè , Simone Conia , Roberto Navigli

Multiaccuracy: Black-Box Post-Processing for Fairness in Classification

Prediction systems are successfully deployed in applications ranging from disease diagnosis, to predicting credit worthiness, to image recognition. Even when the overall accuracy is high, these systems may exhibit systematic biases that…

Machine Learning · Computer Science 2018-08-30 Michael P. Kim , Amirata Ghorbani , James Zou

Improving Complex Knowledge Base Question Answering via Question-to-Action and Question-to-Question Alignment

Complex knowledge base question answering can be achieved by converting questions into sequences of predefined actions. However, there is a significant semantic and structural gap between natural language and action sequences, which makes…

Computation and Language · Computer Science 2022-12-27 Yechun Tang , Xiaoxia Cheng , Weiming Lu

Prediction then Correction: An Abductive Prediction Correction Method for Sequential Recommendation

Sequential recommender models typically generate predictions in a single step during testing, without considering additional prediction correction to enhance performance as humans would. To improve the accuracy of these models, some…

Information Retrieval · Computer Science 2023-04-28 Yulong Huang , Yang Zhang , Qifan Wang , Chenxu Wang , Fuli Feng

Multiclass Alignment of Confidence and Certainty for Network Calibration

Deep neural networks (DNNs) have made great strides in pushing the state-of-the-art in several challenging domains. Recent studies reveal that they are prone to making overconfident predictions. This greatly reduces the overall trust in…

Computer Vision and Pattern Recognition · Computer Science 2023-09-07 Vinith Kugathasan , Muhammad Haris Khan

A Simple and Effective Model for Answering Multi-span Questions

Models for reading comprehension (RC) commonly restrict their output space to the set of all single contiguous spans from the input, in order to alleviate the learning problem and avoid the need for a model that generates text explicitly.…

Computation and Language · Computer Science 2020-10-06 Elad Segal , Avia Efrat , Mor Shoham , Amir Globerson , Jonathan Berant

Clues Before Answers: Generation-Enhanced Multiple-Choice QA

A trending paradigm for multiple-choice question answering (MCQA) is using a text-to-text framework. By unifying data in different tasks into a single text-to-text format, it trains a generative encoder-decoder model which is both powerful…

Computation and Language · Computer Science 2022-05-03 Zixian Huang , Ao Wu , Jiaying Zhou , Yu Gu , Yue Zhao , Gong Cheng

Modular Conformal Calibration

Uncertainty estimates must be calibrated (i.e., accurate) and sharp (i.e., informative) in order to be useful. This has motivated a variety of methods for recalibration, which use held-out data to turn an uncalibrated model into a…

Machine Learning · Computer Science 2022-07-06 Charles Marx , Shengjia Zhao , Willie Neiswanger , Stefano Ermon

Realistic Conversational Question Answering with Answer Selection based on Calibrated Confidence and Uncertainty Measurement

Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times. To apply such models to a real-world scenario,…

Computation and Language · Computer Science 2023-02-13 Soyeong Jeong , Jinheon Baek , Sung Ju Hwang , Jong C. Park

Answer Span Correction in Machine Reading Comprehension

Answer validation in machine reading comprehension (MRC) consists of verifying an extracted answer against an input context and question pair. Previous work has looked at re-assessing the "answerability" of the question given the extracted…

Computation and Language · Computer Science 2020-11-09 Revanth Gangi Reddy , Md Arafat Sultan , Efsun Sarioglu Kayi , Rong Zhang , Vittorio Castelli , Avirup Sil

When Models Decide and When They Bind: A Two-Stage Computation for Multiple-Choice Question-Answering

Multiple-choice question answering (MCQA) is easy to evaluate but adds a meta-task: models must both solve the problem and output the symbol that *represents* the answer, conflating reasoning errors with symbol-binding failures. We study…

Computation and Language · Computer Science 2026-01-08 Hugh Mee Wong , Rick Nouwen , Albert Gatt

Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction

Attention is typically used to select informative sub-phrases that are used for prediction. This paper investigates the novel use of attention as a form of feature augmentation, i.e, casted attention. We propose Multi-Cast Attention…

Computation and Language · Computer Science 2018-06-05 Yi Tay , Luu Anh Tuan , Siu Cheung Hui

Answering Questions by Meta-Reasoning over Multiple Chains of Thought

Modern systems for multi-hop question answering (QA) typically break questions into a sequence of reasoning steps, termed chain-of-thought (CoT), before arriving at a final answer. Often, multiple chains are sampled and aggregated through a…

Computation and Language · Computer Science 2024-08-05 Ori Yoran , Tomer Wolfson , Ben Bogin , Uri Katz , Daniel Deutch , Jonathan Berant

Boosting Process-Correct CoT Reasoning by Modeling Solvability of Multiple-Choice QA

Reasoning quality in large language models depends not only on producing correct answers but also on generating valid intermediate steps. We study this through multiple-choice question answering (MCQA), which provides a controlled setting…

Artificial Intelligence · Computer Science 2025-10-01 Raphael Schumann , Stefan Riezler

MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering

Textbook Question Answering (TQA) is a complex multimodal task to infer answers given large context descriptions and abundant diagrams. Compared with Visual Question Answering (VQA), TQA contains a large number of uncommon terminologies and…

Multimedia · Computer Science 2021-12-07 Fangzhi Xu , Qika Lin , Jun Liu , Lingling Zhang , Tianzhe Zhao , Qi Chai , Yudai Pan

Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction

Large Language Model based multi-agent systems (MAS) excel at collaborative problem solving but remain brittle to cascading errors: a single faulty step can propagate across agents and disrupt the trajectory. In this paper, we present MASC,…

Artificial Intelligence · Computer Science 2026-01-13 Xu Shen , Qi Zhang , Song Wang , Zhen Tan , Xinyu Zhao , Laura Yao , Vaishnav Tadiparthi , Hossein Nourkhiz Mahjoub , Ehsan Moradi Pari , Kwonjoon Lee , Tianlong Chen

MetaQA: Combining Expert Agents for Multi-Skill Question Answering

The recent explosion of question answering (QA) datasets and models has increased the interest in the generalization of models across multiple domains and formats by either training on multiple datasets or by combining multiple models.…

Computation and Language · Computer Science 2023-02-08 Haritz Puerto , Gözde Gül Şahin , Iryna Gurevych

Context-guided Triple Matching for Multiple Choice Question Answering

The task of multiple choice question answering (MCQA) refers to identifying a suitable answer from multiple candidates, by estimating the matching score among the triple of the passage, question and answer. Despite the general research…

Computation and Language · Computer Science 2021-09-28 Xun Yao , Junlong Ma , Xinrong Hu , Junping Liu , Jie Yang , Wanqing Li

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs

When evaluating large language models (LLMs) with multiple-choice question answering (MCQA), it is common to end the prompt with the string "Answer:" to facilitate automated answer extraction via next-token probabilities. However, there is…

Computation and Language · Computer Science 2025-09-19 Mario Sanz-Guerrero , Minh Duc Bui , Katharina von der Wense