Related papers: RelayLLM: Efficient Reasoning via Collaborative De…

Tandem: Riding Together with Large and Small Language Models for Efficient Reasoning

Recent advancements in large language models (LLMs) have catalyzed the rise of reasoning-intensive inference paradigms, where models perform explicit step-by-step reasoning before generating final answers. While such approaches improve…

Artificial Intelligence · Computer Science 2026-04-28 Zichuan Fu , Xian Wu , Guojing Li , Yejing Wang , Yijun Chen , Zihao Zhao , Yixuan Luo , Hanyu Yan , Yefeng Zheng , Xiangyu Zhao

RelayGen: Intra-Generation Model Switching for Efficient Reasoning

Large reasoning models (LRMs) achieve strong performance on complex reasoning tasks by generating long, multi-step reasoning trajectories, but inference-time scaling incurs substantial deployment cost. A key challenge is that generation…

Computation and Language · Computer Science 2026-02-09 Jiwon Song , Yoongon Kim , Jae-Joon Kim

Efficient Inference for Large Reasoning Models: A Survey

Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason, exhibiting promising performance in solving complex tasks. However, their deliberative reasoning process leads…

Computation and Language · Computer Science 2025-08-14 Yue Liu , Jiaying Wu , Yufei He , Ruihan Gong , Jun Xia , Liang Li , Hongcheng Gao , Hongyu Chen , Baolong Bi , Jiaheng Zhang , Zhiqi Huang , Bryan Hooi , Stan Z. Li , Keqin Li

PickLLM: Context-Aware RL-Assisted Large Language Model Routing

Recently, the number of off-the-shelf Large Language Models (LLMs) has exploded with many open-source options. This creates a diverse landscape regarding both serving options (e.g., inference on local hardware vs remote LLM APIs) and model…

Machine Learning · Computer Science 2024-12-18 Dimitrios Sikeridis , Dennis Ramdass , Pranay Pareek

ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models

Small Language Models (SLMs) are a cost-effective alternative to Large Language Models (LLMs), but often struggle with complex reasoning due to their limited capacity and a tendency to produce mistakes or inconsistent answers during…

Computation and Language · Computer Science 2025-08-19 Yuanfeng Xu , Zehui Dai , Jian Liang , Jiapeng Guan , Guangrun Wang , Liang Lin , Xiaohui Lv

Token-Budget-Aware LLM Reasoning

Reasoning is critical for large language models (LLMs) to excel in a wide range of tasks. While methods like Chain-of-Thought (CoT) reasoning and enhance LLM performance by decomposing problems into intermediate steps, they also incur…

Computation and Language · Computer Science 2025-06-03 Tingxu Han , Zhenting Wang , Chunrong Fang , Shiyu Zhao , Shiqing Ma , Zhenyu Chen

Learning to Decode Collaboratively with Multiple Language Models

We propose a method to teach multiple large language models (LLM) to collaborate by interleaving their generations at the token level. We model the decision of which LLM generates the next token as a latent variable. By optimizing the…

Computation and Language · Computer Science 2024-08-28 Shannon Zejiang Shen , Hunter Lang , Bailin Wang , Yoon Kim , David Sontag

OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking

Large language models (LLMs) have revolutionized the landscape of Natural Language Processing systems, but are computationally expensive. To reduce the cost without sacrificing performance, previous studies have explored various approaches…

Computation and Language · Computer Science 2024-10-01 Chia-Hsuan Lee , Hao Cheng , Mari Ostendorf

ScaleRTL: Scaling LLMs with Reasoning Data and Test-Time Compute for Accurate RTL Code Generation

Recent advances in large language models (LLMs) have enabled near-human performance on software coding benchmarks, but their effectiveness in RTL code generation remains limited due to the scarcity of high-quality training data. While prior…

Hardware Architecture · Computer Science 2025-07-17 Chenhui Deng , Yun-Da Tsai , Guan-Ting Liu , Zhongzhi Yu , Haoxing Ren

Reasoning Distillation and Structural Alignment for Improved Code Generation

Effective code generation with language models hinges on two critical factors: accurately understanding the intent of the prompt and generating code that applies algorithmic reasoning to produce correct solutions capable of passing diverse…

Artificial Intelligence · Computer Science 2025-10-21 Amir Jalilifard , Anderson de Rezende Rocha , Marcos Medeiros Raimundo

Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router

Chain-of-thought has been proven essential for enhancing the complex reasoning abilities of Large Language Models (LLMs), but it also leads to high computational costs. Recent advances have explored the method to route queries among…

Computation and Language · Computer Science 2025-12-05 Chenyang Shao , Xinyang Liu , Yutang Lin , Fengli Xu , Yong Li

When to Reason: Semantic Router for vLLM

Large Language Models (LLMs) demonstrate substantial accuracy gains when augmented with reasoning modes such as chain-of-thought and inference-time scaling. However, reasoning also incurs significant costs in inference latency and token…

Emerging Technologies · Computer Science 2025-10-13 Chen Wang , Xunzhuo Liu , Yuhan Liu , Yue Zhu , Xiangxi Mo , Junchen Jiang , Huamin Chen

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Large language models have achieved remarkable success in various tasks but suffer from high computational costs during inference, limiting their deployment in resource-constrained applications. To address this issue, we propose a novel…

Computation and Language · Computer Science 2025-09-11 Wenhao Zheng , Yixiao Chen , Weitong Zhang , Souvik Kundu , Yun Li , Zhengzhong Liu , Eric P. Xing , Hongyi Wang , Huaxiu Yao

Token Level Routing Inference System for Edge Devices

The computational complexity of large language model (LLM) inference significantly constrains their deployment efficiency on edge devices. In contrast, small language models offer faster decoding and lower resource consumption but often…

Computation and Language · Computer Science 2025-04-11 Jianshu She , Wenhao Zheng , Zhengzhong Liu , Hongyi Wang , Eric Xing , Huaxiu Yao , Qirong Ho

Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models

Large Language Models (LLMs) typically generate outputs token by token using a fixed compute budget, leading to inefficient resource utilization. To address this shortcoming, recent advancements in mixture of expert (MoE) models,…

Machine Learning · Computer Science 2024-10-16 Keivan Alizadeh , Iman Mirzadeh , Hooman Shahrokhi , Dmitry Belenko , Frank Sun , Minsik Cho , Mohammad Hossein Sekhavat , Moin Nabi , Mehrdad Farajtabar

Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models

Recent advancements in Large Language Models (LLMs) have significantly enhanced their ability to perform complex reasoning tasks, transitioning from fast and intuitive thinking (System 1) to slow and deep reasoning (System 2). While System…

Computation and Language · Computer Science 2025-04-01 Rui Wang , Hongru Wang , Boyang Xue , Jianhui Pang , Shudong Liu , Yi Chen , Jiahao Qiu , Derek Fai Wong , Heng Ji , Kam-Fai Wong

Derailer-Rerailer: Adaptive Verification for Efficient and Reliable Language Model Reasoning

Large Language Models (LLMs) have shown impressive reasoning capabilities, yet existing prompting methods face a critical trade-off: simple approaches often struggle with complex tasks and reasoning stability, while more sophisticated…

Computation and Language · Computer Science 2025-07-11 Guangya Wan , Yuqi Wu , Hao Wang , Shengming Zhao , Jie Chen , Sheng Li

Conformal Thinking: Risk Control for Reasoning on a Compute Budget

Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve reliability and stopping early when…

Artificial Intelligence · Computer Science 2026-05-15 Xi Wang , Anushri Suresh , Alvin Zhang , Rishi More , William Jurayj , Benjamin Van Durme , Mehrdad Farajtabar , Daniel Khashabi , Eric Nalisnick

SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

Large language models (LLMs) have been widely adopted due to their remarkable performance across various applications, driving the accelerated development of a large number of diverse models. However, these individual LLMs show limitations…

Computation and Language · Computer Science 2025-06-13 Kaushal Kumar Maurya , KV Aditya Srivatsa , Ekaterina Kochmar

Training Language Models to Reason Efficiently

Scaling model size and training data has led to great advances in the performance of Large Language Models (LLMs). However, the diminishing returns of this approach necessitate alternative methods to improve model capabilities, particularly…

Machine Learning · Computer Science 2025-11-05 Daman Arora , Andrea Zanette