English
Related papers

Related papers: Boost Test-Time Performance with Closed-Loop Infer…

200 papers

We ask: Can focusing on likely classes of a single, in-domain sample improve model predictions? Prior work argued ``no''. We put forward a novel rationale in favor of ``yes'': Sharedness of features among classes indicates their reliability…

Machine Learning · Computer Science 2025-12-23 Johannes Schneider

Increasing test-time computation is a straightforward approach to enhancing the quality of responses in Large Language Models (LLMs). While Best-of-N sampling and Self-Consistency with majority voting are simple and effective, they require…

Machine Learning · Computer Science 2025-03-04 Chengsong Huang , Langlin Huang , Jixuan Leng , Jiacheng Liu , Jiaxin Huang

This paper presents a simple, effective, and cost-efficient strategy to improve LLM performance by scaling test-time compute. Our strategy builds upon the repeated-sampling-then-voting framework, with a novel twist: incorporating multiple…

Artificial Intelligence · Computer Science 2025-11-11 Jianhao Chen , Zishuo Xun , Bocheng Zhou , Han Qi , Hangfan Zhang , Qiaosheng Zhang , Yang Chen , Wei Hu , Yuzhong Qu , Wanli Ouyang , Shuyue Hu

We propose to improve in-context learning (ICL) by optimizing the continuous embeddings of a fixed few-shot prompt at test time. The key observation is that the log-probabilities a model assigns to its demonstrated…

Computation and Language · Computer Science 2026-05-25 Baturay Saglam , Dionysis Kalogerias

Recent reasoning models, such as OpenAI's O1 series, have demonstrated exceptional performance on complex reasoning tasks and revealed new test-time scaling laws. Inspired by this, many people have been studying how to train models to…

Computation and Language · Computer Science 2025-06-03 Weizhe Chen , Sven Koenig , Bistra Dilkina

Demonstration learning aims to guide the prompt prediction via providing answered demonstrations in the few shot settings. Despite achieving promising results, existing work only concatenates the answered examples as demonstrations to the…

Machine Learning · Computer Science 2022-09-02 Sirui Wang , Kaiwen Wei , Hongzhi Zhang , Yuntao Li , Wei Wu

Continual learning (CL) aims to incrementally train a model on a sequence of tasks while retaining performance on prior ones. However, storing and replaying data is often infeasible due to privacy or security constraints and impractical for…

Machine Learning · Computer Science 2025-10-31 Ruilin Tong , Haodong Lu , Yuhang Liu , Dong Gong

Machine learning classifiers often produce probabilistic predictions that are critical for accurate and interpretable decision-making in various domains. The quality of these predictions is generally evaluated with proper losses, such as…

Machine Learning · Computer Science 2025-06-26 Eugène Berta , David Holzmüller , Michael I. Jordan , Francis Bach

Repeated Sampling (RS) is a simple inference-time algorithm that has been shown to improve model performance on complex tasks. Although it is an effective way of scaling inference time, it often struggles to generate diverse solution…

Artificial Intelligence · Computer Science 2026-02-17 Divij Handa , Mihir Parmar , Aswin RRV , Md Nayem Uddin , Hamid Palangi , Chitta Baral

Inference-time computation is a powerful paradigm to enhance the performance of large language models (LLMs), with Best-of-N sampling being a widely used technique. However, this method is computationally expensive, requiring both (1) an…

Computation and Language · Computer Science 2024-10-04 Rohin Manvi , Anikait Singh , Stefano Ermon

Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities across a wide range of vision language tasks. However, when applied to large scale image classification, their performance degrades significantly as the label…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Zhipeng Ye , Jiaqi Huang , Feng Jiang , Qiufeng Wang , Yikang Duan , Dawei Wang , Xihang Zhou , Qian Qiao

Recently, prompt tuning methods for pre-trained models have demonstrated promising performance in Class Incremental Learning (CIL). These methods typically involve learning task-specific prompts and predicting the task ID to select the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Qiwei Li , Jiahuan Zhou

Recent advances in large language models (LLMs), such as OpenAI-o1 and DeepSeek-R1, have demonstrated the effectiveness of test-time scaling, where extended reasoning processes substantially enhance model performance. Despite this, current…

Computation and Language · Computer Science 2025-03-26 Xiaoyu Tian , Sitong Zhao , Haotian Wang , Shuaiting Chen , Yunjie Ji , Yiping Peng , Han Zhao , Xiangang Li

Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the…

Machine Learning · Computer Science 2026-05-21 Adam Ousherovitch , Ambuj Tewari

Calibration can reduce overconfident predictions of deep neural networks, but can calibration also accelerate training? In this paper, we show that it can when used to prioritize some examples for performing subset selection. We study the…

Machine Learning · Computer Science 2022-11-17 Ganesh Tata , Gautham Krishna Gudur , Gopinath Chennupati , Mohammad Emtiyaz Khan

Inspired by the success of language models (LM), scaling up deep learning recommendation systems (DLRS) has become a recent trend in the community. All previous methods tend to scale up the model parameters during training time. However,…

Information Retrieval · Computer Science 2025-12-09 Fuyuan Lyu , Zhentai Chen , Jingyan Jiang , Lingjie Li , Xing Tang , Xiuqiang He , Xue Liu

Recently, scaling test-time compute on Large Language Models (LLM) has garnered wide attention. However, there has been limited investigation of how various reasoning prompting strategies perform as scaling. In this paper, we focus on a…

Artificial Intelligence · Computer Science 2025-08-18 Yexiang Liu , Zekun Li , Zhi Fang , Nan Xu , Ran He , Tieniu Tan

As predictive algorithms grow in popularity, using the same dataset to both train and test a new model has become routine across research, policy, and industry. Sample-splitting attains valid inference on model properties by using separate…

Econometrics · Economics 2025-11-27 Bruno Fava

Enabling LLMs to improve their outputs by using more test-time computation is a critical step towards building generally self-improving agents that can operate on open-ended natural language. In this paper, we study the scaling of…

Machine Learning · Computer Science 2024-08-07 Charlie Snell , Jaehoon Lee , Kelvin Xu , Aviral Kumar

Recent trends in test-time scaling for reasoning models (e.g., OpenAI o1, DeepSeek R1) have led to a popular belief that extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance. This raises a natural…

‹ Prev 1 2 3 10 Next ›