Related papers: SkillAggregation: Reference-free LLM-Dependent Agg…

The Majority is not always right: RL training for solution aggregation

Scaling up test-time compute, by generating multiple independent solutions and selecting or aggregating among them, has become a central paradigm for improving large language models (LLMs) on challenging reasoning tasks. While most prior…

Computation and Language · Computer Science 2025-09-09 Wenting Zhao , Pranjal Aggarwal , Swarnadeep Saha , Asli Celikyilmaz , Jason Weston , Ilia Kulikov

1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators?

Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in…

Computation and Language · Computer Science 2024-06-24 Yue Huang , Chenrui Fan , Yuan Li , Siyuan Wu , Tianyi Zhou , Xiangliang Zhang , Lichao Sun

A Comparative Study on Annotation Quality of Crowdsourcing and LLM via Label Aggregation

Whether Large Language Models (LLMs) can outperform crowdsourcing on the data annotation task is attracting interest recently. Some works verified this issue with the average performance of individual crowd workers and LLM workers on some…

Computation and Language · Computer Science 2024-01-19 Jiyi Li

LLMs Can Generate a Better Answer by Aggregating Their Own Responses

Large Language Models (LLMs) have shown remarkable capabilities across tasks, yet they often require additional prompting techniques when facing complex problems. While approaches like self-correction and response selection have emerged as…

Computation and Language · Computer Science 2025-04-15 Zichong Li , Xinyu Feng , Yuheng Cai , Zixuan Zhang , Tianyi Liu , Chen Liang , Weizhu Chen , Haoyu Wang , Tuo Zhao

Paraphrase and Aggregate with Large Language Models for Minimizing Intent Classification Errors

Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve…

Computation and Language · Computer Science 2024-06-26 Vikas Yadav , Zheng Tang , Vijay Srinivasan

Who can we trust? LLM-as-a-jury for Comparative Assessment

Large language models (LLMs) are increasingly applied as automatic evaluators for natural language generation assessment often using pairwise comparative judgements. Existing approaches typically rely on single judges or aggregate multiple…

Computation and Language · Computer Science 2026-05-29 Mengjie Qian , Guangzhi Sun , Mark J. F. Gales , Kate M. Knill

From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks

To reduce the need for human annotations, large language models (LLMs) have been proposed as judges of the quality of other candidate models. The performance of LLM judges is typically evaluated by measuring the correlation with human…

Computation and Language · Computer Science 2025-05-14 Andreas Stephan , Dawei Zhu , Matthias Aßenmacher , Xiaoyu Shen , Benjamin Roth

Learning to Trust the Crowd: A Multi-Model Consensus Reasoning Engine for Large Language Models

Large language models (LLMs) achieve strong average performance yet remain unreliable at the instance level, with frequent hallucinations, brittle failures, and poorly calibrated confidence. We study reliability through the lens of…

Artificial Intelligence · Computer Science 2026-01-13 Pranav Kallem

DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation

The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as…

Computation and Language · Computer Science 2024-12-10 Minzhi Li , Zhengyuan Liu , Shumin Deng , Shafiq Joty , Nancy F. Chen , Min-Yen Kan

An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration

While Large Language Models (LLMs) demonstrate exceptional performance in a multitude of Natural Language Processing (NLP) tasks, they encounter challenges in practical applications, including issues with hallucinations, inadequate…

Computation and Language · Computer Science 2024-06-13 Yihao Li , Ru Zhang , Jianyi Liu

Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation

Large language models (LLMs) have shown remarkable promise but remain challenging to continually improve through traditional finetuning, particularly when integrating capabilities from other specialized LLMs. Popular methods like ensemble…

Computation and Language · Computer Science 2025-06-02 Zhenglun Kong , Zheng Zhan , Shiyue Hou , Yifan Gong , Xin Meng , Pengwei Sui , Peiyan Dong , Xuan Shen , Zifeng Wang , Pu Zhao , Hao Tang , Stratis Ioannidis , Yanzhi Wang

Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form QA

The emergence of Large Language Models (LLMs) as chat assistants capable of generating human-like conversations has amplified the need for robust evaluation methods, particularly for open-ended tasks. Conventional metrics such as EM and F1,…

Computation and Language · Computer Science 2025-11-12 Sher Badshah , Hassan Sajjad

A Novel Approach to Eliminating Hallucinations in Large Language Model-Assisted Causal Discovery

The increasing use of large language models (LLMs) in causal discovery as a substitute for human domain experts highlights the need for optimal model selection. This paper presents the first hallucination survey of popular LLMs for causal…

Computation and Language · Computer Science 2024-11-21 Grace Sng , Yanming Zhang , Klaus Mueller

Human-LLM Hybrid Text Answer Aggregation for Crowd Annotations

The quality is a crucial issue for crowd annotations. Answer aggregation is an important type of solution. The aggregated answers estimated from multiple crowd answers to the same instance are the eventually collected annotations, rather…

Computation and Language · Computer Science 2024-10-23 Jiyi Li

Learning to Reason Across Parallel Samples for LLM Reasoning

Scaling test-time compute brings substantial performance gains for large language models (LLMs). By sampling multiple answers and heuristically aggregate their answers (e.g., either through majority voting or using verifiers to rank the…

Computation and Language · Computer Science 2025-10-13 Jianing Qi , Xi Ye , Hao Tang , Zhigang Zhu , Eunsol Choi

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

Self-improvement in multimodal large language models (MLLMs) is crucial for enhancing their reliability and robustness. However, current methods often rely heavily on MLLMs themselves as judges, leading to high computational costs and…

Computation and Language · Computer Science 2024-11-28 Shijian Deng , Wentian Zhao , Yu-Jhe Li , Kun Wan , Daniel Miranda , Ajinkya Kale , Yapeng Tian

SkillGPT: a RESTful API service for skill extraction and standardization using a Large Language Model

We present SkillGPT, a tool for skill extraction and standardization (SES) from free-style job descriptions and user profiles with an open-source Large Language Model (LLM) as backbone. Most previous methods for similar tasks either need…

Computation and Language · Computer Science 2023-10-19 Nan Li , Bo Kang , Tijl De Bie

Self-training Large Language Models through Knowledge Detection

Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its…

Computation and Language · Computer Science 2024-11-13 Wei Jie Yeo , Teddy Ferdinan , Przemyslaw Kazienko , Ranjan Satapathy , Erik Cambria

MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Multimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence. However, assessing the utility of MLLMs presents considerable challenges, primarily due to…

Computation and Language · Computer Science 2024-06-12 Dongping Chen , Ruoxi Chen , Shilin Zhang , Yinuo Liu , Yaochen Wang , Huichi Zhou , Qihui Zhang , Yao Wan , Pan Zhou , Lichao Sun

Majority Rules: LLM Ensemble is a Winning Approach for Content Categorization

This study introduces an ensemble framework for unstructured text categorization using large language models (LLMs). By integrating multiple models, the ensemble large language model (eLLM) framework addresses common weaknesses of…

Artificial Intelligence · Computer Science 2025-11-21 Ariel Kamen , Yakov Kamen