Related papers: Learning to Decode Collaboratively with Multiple L…

Token Level Routing Inference System for Edge Devices

The computational complexity of large language model (LLM) inference significantly constrains their deployment efficiency on edge devices. In contrast, small language models offer faster decoding and lower resource consumption but often…

Computation and Language · Computer Science 2025-04-11 Jianshu She , Wenhao Zheng , Zhengzhong Liu , Hongyi Wang , Eric Xing , Huaxiu Yao , Qirong Ho

From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during…

Computation and Language · Computer Science 2024-11-21 Sean Welleck , Amanda Bertsch , Matthew Finlayson , Hailey Schoelkopf , Alex Xie , Graham Neubig , Ilia Kulikov , Zaid Harchaoui

Decoding with Limited Teacher Supervision Requires Understanding When to Trust the Teacher

How can small-scale large language models (LLMs) efficiently utilize the supervision of LLMs to improve their generative quality? This question has been well studied in scenarios where there is no restriction on the number of LLM…

Computation and Language · Computer Science 2024-10-04 Hyunjong Ok , Jegwang Ryu , Jaeho Lee

RelayLLM: Efficient Reasoning via Collaborative Decoding

Large Language Models (LLMs) for complex reasoning is often hindered by high computational costs and latency, while resource-efficient Small Language Models (SLMs) typically lack the necessary reasoning capacity. Existing collaborative…

Computation and Language · Computer Science 2026-01-09 Chengsong Huang , Tong Zheng , Langlin Huang , Jinyuan Li , Haolin Liu , Jiaxin Huang

Collaborative decoding of critical tokens for boosting factuality of large language models

The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model. Finetuned and aligned models…

Computation and Language · Computer Science 2024-02-29 Lifeng Jin , Baolin Peng , Linfeng Song , Haitao Mi , Ye Tian , Dong Yu

Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding

Large Language Models (LLMs) often excel in specific domains but fall short in others due to the limitations of their training. Thus, enabling LLMs to solve problems collaboratively by integrating their complementary knowledge promises to…

Computation and Language · Computer Science 2025-03-20 Ziyao Wang , Muneeza Azmat , Ang Li , Raya Horesh , Mikhail Yurochkin

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

Large Language Models (LLMs) exhibit impressive capabilities across various applications but encounter substantial challenges such as high inference latency, considerable training costs, and the generation of hallucinations. Collaborative…

Computation and Language · Computer Science 2024-10-24 Kaiyan Zhang , Jianyu Wang , Ning Ding , Biqing Qi , Ermo Hua , Xingtai Lv , Bowen Zhou

Learning to Seek Help: Dynamic Collaboration Between Small and Large Language Models

Large language models (LLMs) offer strong capabilities but raise cost and privacy concerns, whereas small language models (SLMs) facilitate efficient and private local inference yet suffer from limited capacity. To synergize the…

Computation and Language · Computer Science 2026-04-21 Hang Zeng , Xiangyu Liu , Yong Hu , Chaoyue Niu , Jiarui Zhang , Shaojie Tang , Fan Wu , Guihai Chen

Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding

To mitigate the high inference latency stemming from autoregressive decoding in Large Language Models (LLMs), Speculative Decoding has emerged as a novel decoding paradigm for LLM inference. In each decoding step, this method first drafts…

Computation and Language · Computer Science 2024-06-05 Heming Xia , Zhe Yang , Qingxiu Dong , Peiyi Wang , Yongqi Li , Tao Ge , Tianyu Liu , Wenjie Li , Zhifang Sui

CoLM: Collaborative Large Models via A Client-Server Paradigm

Large models have achieved remarkable performance across a range of reasoning and understanding tasks. Prior work often utilizes model ensembles or multi-agent systems to collaboratively generate responses, effectively operating in a…

Machine Learning · Computer Science 2025-11-11 Siqi Huang , Sida Huang , Hongyuan Zhang

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders

Multilingual Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear. Do they form shared multilingual representations with language-specific decoding, and if so, why does…

Computation and Language · Computer Science 2026-02-10 Abir Harrasse , Florent Draye , Punya Syon Pandey , Zhijing Jin , Bernhard Schölkopf

Probing LLMs for Joint Encoding of Linguistic Categories

Large Language Models (LLMs) exhibit impressive performance on a range of NLP tasks, due to the general-purpose linguistic knowledge acquired during pretraining. Existing model interpretability research (Tenney et al., 2019) suggests that a…

Computation and Language · Computer Science 2023-10-31 Giulio Starace , Konstantinos Papakostas , Rochelle Choenni , Apostolos Panagiotopoulos , Matteo Rosati , Alina Leidinger , Ekaterina Shutova

Guiding Language Model Reasoning with Planning Tokens

Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as chain-of-thought (CoT) reasoning. However, most of the existing approaches to enhance this ability rely…

Computation and Language · Computer Science 2024-08-08 Xinyi Wang , Lucas Caccia , Oleksiy Ostapenko , Xingdi Yuan , William Yang Wang , Alessandro Sordoni

Scalable Multi-Robot Collaboration with Large Language Models: Centralized or Decentralized Systems?

A flurry of recent work has demonstrated that pre-trained large language models (LLMs) can be effective task planners for a variety of single-robot tasks. The planning performance of LLMs is significantly improved via prompting techniques,…

Robotics · Computer Science 2024-03-25 Yongchao Chen , Jacob Arkin , Yang Zhang , Nicholas Roy , Chuchu Fan

Lessons Learned: A Multi-Agent Framework for Code LLMs to Learn and Improve

Recent studies show that LLMs possess different skills and specialize in different tasks. In fact, we observe that their varied performance occur in several levels of granularity. For example, in the code optimization task, code LLMs excel…

Artificial Intelligence · Computer Science 2025-10-24 Yuanzhe Liu , Ryan Deng , Tim Kaler , Xuhao Chen , Charles E. Leiserson , Yao Ma , Jie Chen

How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent…

Computation and Language · Computer Science 2026-05-11 Viacheslav Meshchaninov , Alexander Shabalin , Egor Chimbulatov , Nikita Gushchin , Ilya Koziev , Alexander Korotin , Dmitry Vetrov

Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding

Autoregressive decoding in large language models (LLMs) requires $\mathcal{O}(n)$ sequential steps for $n$ tokens, fundamentally limiting inference throughput. Recent diffusion-based LLMs (dLLMs) enable parallel token generation through…

Computation and Language · Computer Science 2025-10-06 Wenrui Bao , Zhiben Chen , Dan Xu , Yuzhang Shang

Enhancing LLM Code Generation: A Systematic Evaluation of Multi-Agent Collaboration and Runtime Debugging for Improved Accuracy, Reliability, and Latency

The use of large language models (LLMs) for automated code generation has emerged as a significant focus within AI research. As these pretrained models continue to evolve, their ability to understand and generate complex code structures has…

Software Engineering · Computer Science 2025-05-06 Nazmus Ashrafi , Salah Bouktif , Mohammed Mediani

Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

Large language models (LLMs) have become ubiquitous in practice and are widely used for generation tasks such as translation, summarization and instruction following. However, their enormous size and reliance on autoregressive decoding…

Machine Learning · Computer Science 2024-07-18 Benjamin Bergner , Andrii Skliar , Amelie Royer , Tijmen Blankevoort , Yuki Asano , Babak Ehteshami Bejnordi

Fast Large Language Model Collaborative Decoding via Speculation

Large Language Model (LLM) collaborative decoding techniques improve output quality by combining the outputs of multiple models at each generation step, but they incur high computational costs. In this paper, we introduce Collaborative…

Computation and Language · Computer Science 2025-05-30 Jiale Fu , Yuchu Jiang , Junkai Chen , Jiaming Fan , Xin Geng , Xu Yang