English
Related papers

Related papers: SkillAggregation: Reference-free LLM-Dependent Agg…

200 papers

Scaling up test-time compute, by generating multiple independent solutions and selecting or aggregating among them, has become a central paradigm for improving large language models (LLMs) on challenging reasoning tasks. While most prior…

Computation and Language · Computer Science 2025-09-09 Wenting Zhao , Pranjal Aggarwal , Swarnadeep Saha , Asli Celikyilmaz , Jason Weston , Ilia Kulikov

Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages. Despite their capabilities, they exhibit inconsistencies in handling identical queries in…

Computation and Language · Computer Science 2024-06-24 Yue Huang , Chenrui Fan , Yuan Li , Siyuan Wu , Tianyi Zhou , Xiangliang Zhang , Lichao Sun

Whether Large Language Models (LLMs) can outperform crowdsourcing on the data annotation task is attracting interest recently. Some works verified this issue with the average performance of individual crowd workers and LLM workers on some…

Computation and Language · Computer Science 2024-01-19 Jiyi Li

Large Language Models (LLMs) have shown remarkable capabilities across tasks, yet they often require additional prompting techniques when facing complex problems. While approaches like self-correction and response selection have emerged as…

Computation and Language · Computer Science 2025-04-15 Zichong Li , Xinyu Feng , Yuheng Cai , Zixuan Zhang , Tianyi Liu , Chen Liang , Weizhu Chen , Haoyu Wang , Tuo Zhao

Large language models (LLM) have achieved remarkable success in natural language generation but lesser focus has been given to their applicability in decision making tasks such as classification. We show that LLMs like LLaMa can achieve…

Computation and Language · Computer Science 2024-06-26 Vikas Yadav , Zheng Tang , Vijay Srinivasan

Large language models (LLMs) are increasingly applied as automatic evaluators for natural language generation assessment often using pairwise comparative judgements. Existing approaches typically rely on single judges or aggregate multiple…

Computation and Language · Computer Science 2026-05-29 Mengjie Qian , Guangzhi Sun , Mark J. F. Gales , Kate M. Knill

To reduce the need for human annotations, large language models (LLMs) have been proposed as judges of the quality of other candidate models. The performance of LLM judges is typically evaluated by measuring the correlation with human…

Computation and Language · Computer Science 2025-05-14 Andreas Stephan , Dawei Zhu , Matthias Aßenmacher , Xiaoyu Shen , Benjamin Roth

Large language models (LLMs) achieve strong average performance yet remain unreliable at the instance level, with frequent hallucinations, brittle failures, and poorly calibrated confidence. We study reliability through the lens of…

Artificial Intelligence · Computer Science 2026-01-13 Pranav Kallem

The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as…

Computation and Language · Computer Science 2024-12-10 Minzhi Li , Zhengyuan Liu , Shumin Deng , Shafiq Joty , Nancy F. Chen , Min-Yen Kan

While Large Language Models (LLMs) demonstrate exceptional performance in a multitude of Natural Language Processing (NLP) tasks, they encounter challenges in practical applications, including issues with hallucinations, inadequate…

Computation and Language · Computer Science 2024-06-13 Yihao Li , Ru Zhang , Jianyi Liu

Large language models (LLMs) have shown remarkable promise but remain challenging to continually improve through traditional finetuning, particularly when integrating capabilities from other specialized LLMs. Popular methods like ensemble…

Computation and Language · Computer Science 2025-06-02 Zhenglun Kong , Zheng Zhan , Shiyue Hou , Yifan Gong , Xin Meng , Pengwei Sui , Peiyan Dong , Xuan Shen , Zifeng Wang , Pu Zhao , Hao Tang , Stratis Ioannidis , Yanzhi Wang

The emergence of Large Language Models (LLMs) as chat assistants capable of generating human-like conversations has amplified the need for robust evaluation methods, particularly for open-ended tasks. Conventional metrics such as EM and F1,…

Computation and Language · Computer Science 2025-11-12 Sher Badshah , Hassan Sajjad

The increasing use of large language models (LLMs) in causal discovery as a substitute for human domain experts highlights the need for optimal model selection. This paper presents the first hallucination survey of popular LLMs for causal…

Computation and Language · Computer Science 2024-11-21 Grace Sng , Yanming Zhang , Klaus Mueller

The quality is a crucial issue for crowd annotations. Answer aggregation is an important type of solution. The aggregated answers estimated from multiple crowd answers to the same instance are the eventually collected annotations, rather…

Computation and Language · Computer Science 2024-10-23 Jiyi Li

Scaling test-time compute brings substantial performance gains for large language models (LLMs). By sampling multiple answers and heuristically aggregate their answers (e.g., either through majority voting or using verifiers to rank the…

Computation and Language · Computer Science 2025-10-13 Jianing Qi , Xi Ye , Hao Tang , Zhigang Zhu , Eunsol Choi

Self-improvement in multimodal large language models (MLLMs) is crucial for enhancing their reliability and robustness. However, current methods often rely heavily on MLLMs themselves as judges, leading to high computational costs and…

Computation and Language · Computer Science 2024-11-28 Shijian Deng , Wentian Zhao , Yu-Jhe Li , Kun Wan , Daniel Miranda , Ajinkya Kale , Yapeng Tian

We present SkillGPT, a tool for skill extraction and standardization (SES) from free-style job descriptions and user profiles with an open-source Large Language Model (LLM) as backbone. Most previous methods for similar tasks either need…

Computation and Language · Computer Science 2023-10-19 Nan Li , Bo Kang , Tijl De Bie

Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its…

Computation and Language · Computer Science 2024-11-13 Wei Jie Yeo , Teddy Ferdinan , Przemyslaw Kazienko , Ranjan Satapathy , Erik Cambria

Multimodal Large Language Models (MLLMs) have gained significant attention recently, showing remarkable potential in artificial general intelligence. However, assessing the utility of MLLMs presents considerable challenges, primarily due to…

Computation and Language · Computer Science 2024-06-12 Dongping Chen , Ruoxi Chen , Shilin Zhang , Yinuo Liu , Yaochen Wang , Huichi Zhou , Qihui Zhang , Yao Wan , Pan Zhou , Lichao Sun

This study introduces an ensemble framework for unstructured text categorization using large language models (LLMs). By integrating multiple models, the ensemble large language model (eLLM) framework addresses common weaknesses of…

Artificial Intelligence · Computer Science 2025-11-21 Ariel Kamen , Yakov Kamen
‹ Prev 1 2 3 10 Next ›