Related papers: CharED: Character-wise Ensemble Decoding for Large…

Token-level Ensembling of Models with Different Vocabularies

Model ensembling is a technique to combine the predicted distributions of two or more models, often leading to improved robustness and performance. For ensembling in text generation, the next token's probability distribution is derived from…

Computation and Language · Computer Science 2025-03-03 Rachel Wicks , Kartik Ravisankar , Xinchen Yang , Philipp Koehn , Matt Post

Learning to Decode Collaboratively with Multiple Language Models

We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the…

Computation and Language · Computer Science 2024-08-28 Shannon Zejiang Shen , Hunter Lang , Bailin Wang , Yoon Kim , David Sontag

Majority Rules: LLM Ensemble is a Winning Approach for Content Categorization

This study introduces an ensemble framework for unstructured text categorization using large language models (LLMs). By integrating multiple models, the ensemble large language model (eLLM) framework addresses common weaknesses of…

Artificial Intelligence · Computer Science 2025-11-21 Ariel Kamen , Yakov Kamen

Determine-Then-Ensemble: Necessity of Top-k Union for Large Language Model Ensembling

Large language models (LLMs) exhibit varying strengths and weaknesses across different tasks, prompting recent studies to explore the benefits of ensembling models to leverage their complementary advantages. However, existing LLM ensembling…

Computation and Language · Computer Science 2025-02-26 Yuxuan Yao , Han Wu , Mingyang Liu , Sichun Luo , Xiongwei Han , Jie Liu , Zhijiang Guo , Linqi Song

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during…

Computation and Language · Computer Science 2024-11-21 Sean Welleck , Amanda Bertsch , Matthew Finlayson , Hailey Schoelkopf , Alex Xie , Graham Neubig , Ilia Kulikov , Zaid Harchaoui

Tender: Accelerating Large Language Models via Tensor Decomposition and Runtime Requantization

Large language models (LLMs) demonstrate outstanding performance in various tasks in machine learning and have thus become one of the most important workloads in today's computing landscape. However, deploying LLM inference poses challenges…

Machine Learning · Computer Science 2024-06-21 Jungi Lee , Wonbeom Lee , Jaewoong Sim

CharBench: Evaluating the Role of Tokenization in Character-Level Tasks

Tasks that require character-level reasoning, such as counting or locating characters within words, remain challenging for contemporary language models. A common conjecture is that language models' reliance on subword units, rather than…

Computation and Language · Computer Science 2026-04-08 Omri Uzan , Yuval Pinter

Token Level Routing Inference System for Edge Devices

The computational complexity of large language model (LLM) inference significantly constrains their deployment efficiency on edge devices. In contrast, small language models offer faster decoding and lower resource consumption but often…

Computation and Language · Computer Science 2025-04-11 Jianshu She , Wenhao Zheng , Zhengzhong Liu , Hongyi Wang , Eric Xing , Huaxiu Yao , Qirong Ho

Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models

Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token…

Computation and Language · Computer Science 2025-06-12 Jui-Ming Yao , Hao-Yuan Chen , Zi-Xian Tang , Bing-Jia Tan , Sheng-Wei Peng , Bing-Cheng Xie , Shun-Feng Su

The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models

Despite their remarkable progress across diverse domains, Large Language Models (LLMs) consistently fail at simple character-level tasks, such as counting letters in words, due to a fundamental limitation: tokenization. In this work, we…

Computation and Language · Computer Science 2025-09-17 Adrian Cosma , Stefan Ruseti , Emilian Radoi , Mihai Dascalu

Spelling-out is not Straightforward: LLMs' Capability of Tokenization from Token to Characters

Large language models (LLMs) can spell out tokens character by character with high accuracy, yet they struggle with more complex character-level tasks, such as identifying compositional subcomponents within tokens. In this work, we…

Computation and Language · Computer Science 2025-06-13 Tatsuya Hiraoka , Kentaro Inui

M-Ped: Multi-Prompt Ensemble Decoding for Large Language Models

With the widespread application of Large Language Models (LLMs) in the field of Natural Language Processing (NLP), enhancing their performance has become a research hotspot. This paper presents a novel multi-prompt ensemble decoding…

Computation and Language · Computer Science 2024-12-25 Jiaxin Guo , Daimeng Wei , Yuanchang Luo , Shimin Tao , Hengchao Shang , Zongyao Li , Shaojun Li , Jinlong Yang , Zhanglin Wu , Zhiqiang Rao , Hao Yang

Ensembling Large Language Models for Code Vulnerability Detection: An Empirical Evaluation

Code vulnerability detection is crucial for ensuring the security and reliability of modern software systems. Recently, Large Language Models (LLMs) have shown promising capabilities in this domain. However, notable discrepancies in…

Software Engineering · Computer Science 2025-09-19 Zhihong Sun , Jia Li , Yao Wan , Chuanyi Li , Hongyu Zhang , Zhi jin , Ge Li , Hong Liu , Chen Lyu , Songlin Hu

Rethinking LLM Ensembling from the Perspective of Mixture Models

Model ensembling is a well-established technique for improving the performance of machine learning models. Conventionally, this involves averaging the output distributions of multiple models and selecting the most probable label. This idea…

Machine Learning · Computer Science 2026-05-26 Jiale Fu , Yuchu Jiang , Peijun Wu , Chonghan Liu , Joey Tianyi Zhou , Xu Yang

Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding

Large Language Models (LLMs) often excel in specific domains but fall short in others due to the limitations of their training. Thus, enabling LLMs to solve problems collaboratively by integrating their complementary knowledge promises to…

Computation and Language · Computer Science 2025-03-20 Ziyao Wang , Muneeza Azmat , Ang Li , Raya Horesh , Mikhail Yurochkin

Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

Large language models (LLMs) exhibit complementary strengths in various tasks, motivating the research of LLM ensembling. However, existing work focuses on training an extra reward model or fusion model to select or combine all candidate…

Computation and Language · Computer Science 2024-05-31 Yichong Huang , Xiaocheng Feng , Baohang Li , Yang Xiang , Hui Wang , Bing Qin , Ting Liu

Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

Large Language Models (LLMs) have shown impressive moral reasoning abilities. Yet they often diverge when confronted with complex, multi-factor moral dilemmas. To address these discrepancies, we propose a framework that synthesizes multiple…

Computation and Language · Computer Science 2026-02-09 Chenchen Yuan , Zheyu Zhang , Shuo Yang , Bardh Prenkaj , Gjergji Kasneci

Enhancing Large Language Model Efficiencyvia Symbolic Compression: A Formal Approach Towards Interpretability

Large language models (LLMs) face significant token efficiency bottlenecks in code generation and logical reasoning tasks, a challenge that directly impacts inference cost and model interpretability. This paper proposes a formal framework…

Artificial Intelligence · Computer Science 2025-02-03 Lumen AI , Tengzhou No. 1 Middle School , Shihao Ji , Zihui Song , Fucheng Zhong , Jisen Jia , Zhaobo Wu , Zheyi Cao , Tianhao Xu

TABED: Test-Time Adaptive Ensemble Drafting for Robust Speculative Decoding in LVLMs

Speculative decoding (SD) has proven effective for accelerating LLM inference by quickly generating draft tokens and verifying them in parallel. However, SD remains largely unexplored for Large Vision-Language Models (LVLMs), which extend…

Machine Learning · Computer Science 2026-01-29 Minjae Lee , Wonjun Kang , Byeongkeun Ahn , Christian Classen , Kevin Galim , Seunghyuk Oh , Minghao Yan , Hyung Il Koo , Kangwook Lee

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires…

Computation and Language · Computer Science 2024-08-20 Xukun Liu , Bowen Lei , Ruqi Zhang , Dongkuan Xu