English
Related papers

Related papers: Generalized Parallel Scaling with Interdependent G…

200 papers

Inference-time computation is a powerful paradigm to enhance the performance of large language models (LLMs), with Best-of-N sampling being a widely used technique. However, this method is computationally expensive, requiring both (1) an…

Computation and Language · Computer Science 2024-10-04 Rohin Manvi , Anikait Singh , Stefano Ermon

Large Language Models (LLMs) are powerful but often too slow and costly for real-world use during inference. Looped transformers save on parameters by reusing the same weights for multiple computational steps, or "loops." However, this…

Computation and Language · Computer Science 2025-10-30 Bohong Wu , Mengzhao Chen , Xiang Luo , Shen Yan , Qifan Yu , Fan Xia , Tianqi Zhang , Hongrui Zhan , Zheng Zhong , Xun Zhou , Siyuan Qiao , Xingyan Bin

Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs), typically by sampling multiple token-based chains-of-thought in parallel and aggregating outcomes through voting or search. Recent advances…

Computation and Language · Computer Science 2026-04-21 Runyang You , Yongqi Li , Meng Liu , Wenjie Wang , Liqiang Nie , Wenjie Li

This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple…

Artificial Intelligence · Computer Science 2025-11-11 Jianhao Chen , Zishuo Xun , Bocheng Zhou , Han Qi , Hangfan Zhang , Qiaosheng Zhang , Yang Chen , Wei Hu , Yuzhong Qu , Wanli Ouyang , Shuyue Hu

Large Language Models (LLMs), despite their remarkable capabilities, are prone to generating hallucinated or outdated content due to their static internal knowledge. While Retrieval-Augmented Generation (RAG) integrated with Reinforcement…

Computation and Language · Computer Science 2026-01-14 Zhiwen Tan , Jiaming Huang , Qintong Wu , Hongxuan Zhang , Chenyi Zhuang , Jinjie Gu

Large Language Models (LLMs) have achieved impressive capabilities in language understanding and generation, yet they continue to underperform on knowledge-intensive reasoning tasks due to limited access to structured context and multi-hop…

Computation and Language · Computer Science 2025-06-26 Travis Thompson , Seung-Hwan Lim , Paul Liu , Ruoying He , Dongkuan Xu

Recent advancements in large language models (LLMs) have shifted focus toward scaling inference-time compute, improving performance without retraining the model. A common approach is to sample multiple outputs in parallel, and select one of…

Computation and Language · Computer Science 2025-06-26 Ammar Khairi , Daniel D'souza , Ye Shen , Julia Kreutzer , Sara Hooker

One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during…

Computation and Language · Computer Science 2024-11-21 Sean Welleck , Amanda Bertsch , Matthew Finlayson , Hailey Schoelkopf , Alex Xie , Graham Neubig , Ilia Kulikov , Zaid Harchaoui

We present VerilogMonkey, an empirical study of parallel scaling for the under-explored task of automated Verilog generation. Parallel scaling improves LLM performance by sampling many outputs in parallel. Across multiple benchmarks and…

Programming Languages · Computer Science 2025-09-23 Juxin Niu , Yuxin Du , Dan Niu , Xi Wang , Zhe Jiang , Nan Guan

Scaling inference compute in large language models (LLMs) through repeated sampling consistently increases the coverage (fraction of problems solved) as the number of samples increases. We conjecture that this observed improvement is…

Computation and Language · Computer Science 2024-10-22 Gal Yona , Or Honovich , Omer Levy , Roee Aharoni

Generative sequence modeling faces a fundamental tension between the expressivity of Transformers and the efficiency of linear sequence models. Existing efficient architectures are theoretically bounded by shallow, single-step linear…

Machine Learning · Computer Science 2026-02-13 Jie Jiang , Ke Cheng , Xin Xu , Mengyang Pang , Tianhao Lu , Jiaheng Li , Yue Liu , Yuan Wang , Jun Zhang , Huan Yu , Zhouchen Lin

Capturing complex user preferences from sparse behavioral sequences remains a fundamental challenge in sequential recommendation. Recent latent reasoning methods have shown promise by extending test-time computation through multi-step…

Information Retrieval · Computer Science 2026-01-07 Jiakai Tang , Xu Chen , Wen Chen , Jian Wu , Yuning Jiang , Bo Zheng

The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g.,…

Machine Learning · Computer Science 2018-06-12 Zhihao Jia , Sina Lin , Charles R. Qi , Alex Aiken

Generative Large Language Models (LLMs) based on the Transformer architecture have recently emerged as a dominant foundation model for a wide range of Natural Language Processing tasks. Nevertheless, their application in real-time scenarios…

Computation and Language · Computer Science 2024-01-04 Coleman Hooper , Sehoon Kim , Hiva Mohammadzadeh , Hasan Genc , Kurt Keutzer , Amir Gholami , Sophia Shao

Large language models (LLMs) have been a disruptive innovation in recent years, and they play a crucial role in our daily lives due to their ability to understand and generate human-like text. Their capabilities include natural language…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-17 Akrit Mudvari , Yuang Jiang , Leandros Tassiulas

Test-time scaling (TTS) has gained widespread attention for enhancing LLM reasoning. Existing approaches such as Best-of-N and majority voting are limited as their performance depends on the quality of candidate responses, making them…

Machine Learning · Computer Science 2026-04-28 Qibin Wang , Pu Zhao , Shaohan Huang , Fangkai Yang , Lu Wang , Furu Wei , Qingwei Lin , Saravan Rajmohan , Dongmei Zhang

With the increasing capabilities of Large Language Models (LLMs), parallel reasoning has emerged as a new inference paradigm that enhances reasoning robustness by concurrently exploring multiple lines of thought before converging on a final…

Computation and Language · Computer Science 2025-10-15 Ziqi Wang , Boye Niu , Zipeng Gao , Zhi Zheng , Tong Xu , Linghui Meng , Zhongli Li , Jing Liu , Yilong Chen , Chen Zhu , Hua Wu , Haifeng Wang , Enhong Chen

Test-Time Scaling (TTS) enhances the reasoning capabilities of large language models by allocating additional inference compute to explore the solution space. However, existing parallel TTS methods typically keep branches isolated during…

Computation and Language · Computer Science 2026-05-27 Xinglin Wang , Hao Lin , Shaoxiong Feng , Peiwen Yuan , Yiwei Li , Jiayi Shi , Yueqi Zhang , Chuyi Tan , Ji Zhang , Boyuan Pan , Yao Hu , Kan Li

The remarkable capabilities of Large Language Models (LLMs) are overshadowed by their immense computational cost. While recent work has shown that many LLM layers can be reordered or even removed with minimal impact on accuracy, these…

Machine Learning · Computer Science 2026-01-07 Ramón Calvo González , Daniele Paliotta , Matteo Pagliardini , Martin Jaggi , François Fleuret

Large language model (LLM) scaling inference is key to unlocking greater performance, and leveraging diversity has proven an effective way to enhance it. Motivated by the observed relationship between solution accuracy and meaningful…

Machine Learning · Computer Science 2025-12-22 Tianchun Wang , Zichuan Liu , Yuanzhou Chen , Jonathan Light , Weiyang Liu , Haifeng Chen , Xiang Zhang , Wei Cheng
‹ Prev 1 2 3 10 Next ›