Related papers: Optimizing Language Models for Inference Time Obje…

Towards Theoretical Understanding of Transformer Test-Time Computing: Investigation on In-Context Linear Regression

Using more test-time computation during language model inference, such as generating more intermediate thoughts or sampling multiple candidate answers, has proven effective in significantly improving model performance. This paper takes an…

Machine Learning · Computer Science 2025-08-20 Xingwu Chen , Miao Lu , Beining Wu , Difan Zou

Changing Model Behavior at Test-Time Using Reinforcement Learning

Machine learning models are often used at test-time subject to constraints and trade-offs not present at training-time. For example, a computer vision model operating on an embedded device may need to perform real-time inference, or a…

Machine Learning · Statistics 2017-02-28 Augustus Odena , Dieterich Lawson , Christopher Olah

Bayesian Optimization for Selecting Efficient Machine Learning Models

The performance of many machine learning models depends on their hyper-parameter settings. Bayesian Optimization has become a successful tool for hyper-parameter optimization of machine learning algorithms, which aims to identify optimal…

Machine Learning · Computer Science 2020-08-04 Lidan Wang , Franck Dernoncourt , Trung Bui

Language Inference with Multi-head Automata through Reinforcement Learning

The purpose of this paper is to use reinforcement learning to model learning agents which can recognize formal languages. Agents are modeled as simple multi-head automaton, a new model of finite automaton that uses multiple heads, and six…

Machine Learning · Computer Science 2020-10-21 Alper Şekerci , Özlem Salehi

Test-Time Scaling of Reasoning Models for Machine Translation

Test-time scaling (TTS) has enhanced the performance of Reasoning Models (RMs) on various tasks such as math and coding, yet its efficacy in machine translation (MT) remains underexplored. This paper investigates whether increased…

Computation and Language · Computer Science 2026-01-13 Zihao Li , Shaoxiong Ji , Jörg Tiedemann

Training Language Models to Reason Efficiently

Scaling model size and training data has led to great advances in the performance of Large Language Models (LLMs). However, the diminishing returns of this approach necessitate alternative methods to improve model capabilities, particularly…

Machine Learning · Computer Science 2025-11-05 Daman Arora , Andrea Zanette

Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment

Inference-time computation offers a powerful axis for scaling the performance of language models. However, naively increasing computation in techniques like Best-of-N sampling can lead to performance degradation due to reward hacking.…

Artificial Intelligence · Computer Science 2025-04-09 Audrey Huang , Adam Block , Qinghua Liu , Nan Jiang , Akshay Krishnamurthy , Dylan J. Foster

Offline Learning and Forgetting for Reasoning with Large Language Models

Leveraging inference-time search in large language models has proven effective in further enhancing a trained model's capability to solve complex mathematical and reasoning problems. However, this approach significantly increases…

Machine Learning · Computer Science 2025-10-29 Tianwei Ni , Allen Nie , Sapana Chaudhary , Yao Liu , Huzefa Rangwala , Rasool Fakoor

Inference-Time Reward Hacking in Large Language Models

A common paradigm to improve the performance of large language models is optimizing for a reward model. Reward models assign a numerical score to an LLM's output that indicates, for example, how likely it is to align with user preferences…

Machine Learning · Computer Science 2025-11-06 Hadi Khalaf , Claudio Mayrink Verdun , Alex Oesterling , Himabindu Lakkaraju , Flavio du Pin Calmon

Robustifying Language Models with Test-Time Adaptation

Large-scale language models achieved state-of-the-art performance over a number of language tasks. However, they fail on adversarial language examples, which are sentences optimized to fool the language models but with similar semantic…

Computation and Language · Computer Science 2023-10-31 Noah Thomas McDermott , Junfeng Yang , Chengzhi Mao

Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization

Language models are now prevalent in software engineering with many developers using them to automate tasks and accelerate their development. While language models have been tremendous at accomplishing complex software engineering tasks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Daniel Nichols , Konstantinos Parasyris , Charles Jekel , Abhinav Bhatele , Harshitha Menon

Trading Inference-Time Compute for Adversarial Robustness

We conduct experiments on the impact of increasing inference-time compute in reasoning models (specifically OpenAI o1-preview and o1-mini) on their robustness to adversarial attacks. We find that across a variety of attacks, increased…

Machine Learning · Computer Science 2025-02-03 Wojciech Zaremba , Evgenia Nitishinskaya , Boaz Barak , Stephanie Lin , Sam Toyer , Yaodong Yu , Rachel Dias , Eric Wallace , Kai Xiao , Johannes Heidecke , Amelia Glaese

Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning

Sentence compression reduces the length of text by removing non-essential content while preserving important facts and grammaticality. Unsupervised objective driven methods for sentence compression can be used to create customized models…

Computation and Language · Computer Science 2022-05-18 Demian Gholipour Ghalandari , Chris Hokamp , Georgiana Ifrim

Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms

Recent years have seen significant advancements in foundation models through generative pre-training, yet algorithmic innovation in this space has largely stagnated around autoregressive models for discrete signals and diffusion models for…

Machine Learning · Computer Science 2025-03-12 Jiaming Song , Linqi Zhou

Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

The internal structure and operation mechanism of large-scale language models are analyzed theoretically, especially how Transformer and its derivative architectures can restrict computing efficiency while capturing long-term dependencies.…

Machine Learning · Computer Science 2024-05-21 Taiyuan Mei , Yun Zi , Xiaohan Cheng , Zijun Gao , Qi Wang , Haowei Yang

Compute Aligned Training: Optimizing for Test Time Inference

Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the…

Machine Learning · Computer Science 2026-05-21 Adam Ousherovitch , Ambuj Tewari

Inference Time Alignment with Reward-Guided Tree Search

Inference-time computation methods enhance the performance of Large Language Models (LLMs) by leveraging additional computational resources to achieve superior results. Common techniques, such as Best-of-N sampling, Majority Voting, and…

Computation and Language · Computer Science 2024-11-27 Chia-Yu Hung , Navonil Majumder , Ambuj Mehrish , Soujanya Poria

Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints

Reinforcement learning can greatly benefit from the use of options as a way of encoding recurring behaviours and to foster exploration. An important open problem is how can an agent autonomously learn useful options when solving particular…

Machine Learning · Computer Science 2020-01-07 Manuel Del Verme , Bruno Castro da Silva , Gianluca Baldassarre

An exploration for higher efficiency in multi objective optimisation with reinforcement learning

Efficiency in optimisation and search processes persists to be one of the challenges, which affects the performance and use of optimisation algorithms. Utilising a pool of operators instead of a single operator to handle move operations…

Artificial Intelligence · Computer Science 2025-12-12 Mehmet Emin Aydin

Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models

Parameter-shared pre-trained language models (PLMs) have emerged as a successful approach in resource-constrained environments, enabling substantial reductions in model storage and memory costs without significant performance compromise.…

Computation and Language · Computer Science 2023-10-20 Weize Chen , Xiaoyue Xu , Xu Han , Yankai Lin , Ruobing Xie , Zhiyuan Liu , Maosong Sun , Jie Zhou