Related papers: Boost Test-Time Performance with Closed-Loop Infer…

Focus on Likely Classes for Test-Time Prediction

We ask: Can focusing on likely classes of a single, in-domain sample improve model predictions? Prior work argued ``no''. We put forward a novel rationale in favor of ``yes'': Sharedness of features among classes indicates their reliability…

Machine Learning · Computer Science 2025-12-23 Johannes Schneider

Efficient Test-Time Scaling via Self-Calibration

Increasing test-time computation is a straightforward approach to enhancing the quality of responses in Large Language Models (LLMs). While Best-of-N sampling and Self-Consistency with majority voting are simple and effective, they require…

Machine Learning · Computer Science 2025-03-04 Chengsong Huang , Langlin Huang , Jixuan Leng , Jiacheng Liu , Jiaxin Huang

Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute

This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple…

Artificial Intelligence · Computer Science 2025-11-11 Jianhao Chen , Zishuo Xun , Bocheng Zhou , Han Qi , Hangfan Zhang , Qiaosheng Zhang , Yang Chen , Wei Hu , Yuzhong Qu , Wanli Ouyang , Shuyue Hu

Self-Improving In-Context Learning

We propose to improve in-context learning (ICL) by optimizing the continuous embeddings of a fixed few-shot prompt at test time. The key observation is that the log-probabilities a model assigns to its demonstrated…

Computation and Language · Computer Science 2026-05-25 Baturay Saglam , Dionysis Kalogerias

Iterative Deepening Sampling as Efficient Test-Time Scaling

Recent reasoning models, such as OpenAI's O1 series, have demonstrated exceptional performance on complex reasoning tasks and revealed new test-time scaling laws. Inspired by this, many people have been studying how to train models to…

Computation and Language · Computer Science 2025-06-03 Weizhe Chen , Sven Koenig , Bistra Dilkina

Let Me Check the Examples: Enhancing Demonstration Learning via Explicit Imitation

Demonstration learning aims to guide the prompt prediction via providing answered demonstrations in the few shot settings. Despite achieving promising results, existing work only concatenates the answered examples as demonstrations to the…

Machine Learning · Computer Science 2022-09-02 Sirui Wang , Kaiwen Wei , Hongzhi Zhang , Yuntao Li , Wei Wu

Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning

Continual learning (CL) aims to incrementally train a model on a sequence of tasks while retaining performance on prior ones. However, storing and replaying data is often infeasible due to privacy or security constraints and impractical for…

Machine Learning · Computer Science 2025-10-31 Ruilin Tong , Haodong Lu , Yuhang Liu , Dong Gong

Rethinking Early Stopping: Refine, Then Calibrate

Machine learning classifiers often produce probabilistic predictions that are critical for accurate and interpretable decision-making in various domains. The quality of these predictions is generally evaluated with proper losses, such as…

Machine Learning · Computer Science 2025-06-26 Eugène Berta , David Holzmüller , Michael I. Jordan , Francis Bach

GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time

Repeated Sampling (RS) is a simple inference-time algorithm that has been shown to improve model performance on complex tasks. Although it is an effective way of scaling inference time, it often struggles to generate diverse solution…

Artificial Intelligence · Computer Science 2026-02-17 Divij Handa , Mihir Parmar , Aswin RRV , Md Nayem Uddin , Hamid Palangi , Chitta Baral

Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation

Inference-time computation is a powerful paradigm to enhance the performance of large language models (LLMs), with Best-of-N sampling being a widely used technique. However, this method is computationally expensive, requiring both (1) an…

Computation and Language · Computer Science 2024-10-04 Rohin Manvi , Anikait Singh , Stefano Ermon

Divide-and-Conquer Inference for Large-Scale Visual Recognition with Multimodal Large Language Models

Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities across a wide range of vision language tasks. However, when applied to large scale image classification, their performance degrades significantly as the label…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Zhipeng Ye , Jiaqi Huang , Feng Jiang , Qiufeng Wang , Yikang Duan , Dawei Wang , Xihang Zhou , Qian Qiao

CAPrompt: Cyclic Prompt Aggregation for Pre-Trained Model Based Class Incremental Learning

Recently, prompt tuning methods for pre-trained models have demonstrated promising performance in Class Incremental Learning (CIL). These methods typically involve learning task-specific prompts and predicting the task ID to select the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Qiwei Li , Jiahuan Zhou

Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking

Recent advances in large language models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated the effectiveness of test-time scaling, where extended reasoning processes substantially enhance model performance. Despite this, current…

Computation and Language · Computer Science 2025-03-26 Xiaoyu Tian , Sitong Zhao , Haotian Wang , Shuaiting Chen , Yunjie Ji , Yiping Peng , Han Zhao , Xiangang Li

Compute Aligned Training: Optimizing for Test Time Inference

Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the…

Machine Learning · Computer Science 2026-05-21 Adam Ousherovitch , Ambuj Tewari

Can Calibration Improve Sample Prioritization?

Calibration can reduce overconfident predictions of deep neural networks, but can calibration also accelerate training? In this paper, we show that it can when used to prioritize some examples for performing subset selection. We study the…

Machine Learning · Computer Science 2022-11-17 Ganesh Tata , Gautham Krishna Gudur , Gopinath Chennupati , Mohammad Emtiyaz Khan

Exploring Test-time Scaling via Prediction Merging on Large-Scale Recommendation

Inspired by the success of language models (LM), scaling up deep learning recommendation systems (DLRS) has become a recent trend in the community. All previous methods tend to scale up the model parameters during training time. However,…

Information Retrieval · Computer Science 2025-12-09 Fuyuan Lyu , Zhentai Chen , Jingyan Jiang , Lingjie Li , Xing Tang , Xiuqiang He , Xue Liu

Rethinking the Role of Prompting Strategies in LLM Test-Time Scaling: A Perspective of Probability Theory

Recently, scaling test-time compute on Large Language Models (LLM) has garnered wide attention. However, there has been limited investigation of how various reasoning prompting strategies perform as scaling. In this paper, we focus on a…

Artificial Intelligence · Computer Science 2025-08-18 Yexiang Liu , Zekun Li , Zhi Fang , Nan Xu , Ran He , Tieniu Tan

Training and Testing with Multiple Splits: A Central Limit Theorem for Split-Sample Estimators

As predictive algorithms grow in popularity, using the same dataset to both train and test a new model has become routine across research, policy, and industry. Sample-splitting attains valid inference on model properties by using separate…

Econometrics · Economics 2025-11-27 Bruno Fava

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. In this paper, we study the scaling of…

Machine Learning · Computer Science 2024-08-07 Charlie Snell , Jaehoon Lee , Kelvin Xu , Aviral Kumar

Does Thinking More always Help? Mirage of Test-Time Scaling in Reasoning Models

Recent trends in test-time scaling for reasoning models (e.g., OpenAI o1, DeepSeek R1) have led to a popular belief that extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance. This raises a natural…

Artificial Intelligence · Computer Science 2025-10-24 Soumya Suvra Ghosal , Souradip Chakraborty , Avinash Reddy , Yifu Lu , Mengdi Wang , Dinesh Manocha , Furong Huang , Mohammad Ghavamzadeh , Amrit Singh Bedi