Related papers: BOSE: A Systematic Evaluation Method Optimized for…

MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics

Reliable evaluation is fundamental to the progress of Large Language Models (LLMs), yet the evaluation process during pre-training is plagued by significant instability that obscures true learning dynamics. In this work, we systematically…

Computation and Language · Computer Science 2026-03-17 Jiapeng Wang , Changxin Tian , Kunlong Chen , Ziqi Liu , Jiaxin Mao , Wayne Xin Zhao , Zhiqiang Zhang , Jun Zhou

An Information-Theoretic Framework for Unifying Active Learning Problems

This paper presents an information-theoretic framework for unifying active learning problems: level set estimation (LSE), Bayesian optimization (BO), and their generalized variant. We first introduce a novel active learning criterion that…

Machine Learning · Computer Science 2020-12-22 Quoc Phong Nguyen , Bryan Kian Hsiang Low , Patrick Jaillet

COSEE: Consistency-Oriented Signal-Based Early Exiting via Calibrated Sample Weighting Mechanism

Early exiting is an effective paradigm for improving the inference efficiency of pre-trained language models (PLMs) by dynamically adjusting the number of executed layers for each sample. However, in most existing works, easy and hard…

Machine Learning · Computer Science 2024-12-19 Jianing He , Qi Zhang , Hongyun Zhang , Xuanjing Huang , Usman Naseem , Duoqian Miao

Confidence-Orchestrated Self-Evolution against Uncertain LLM Feedback

Self-evolving large language models (LLMs) learn by generating their own training tasks and solutions, reducing reliance on human-curated supervision. However, in many reasoning domains, the model must also validate generated tasks and…

Artificial Intelligence · Computer Science 2026-05-28 Bowen Wei , Nan Wang , Yuqing Zhou , Jinhao Pan , Ziwei Zhu

Learning to Boost the Performance of Stable Nonlinear Systems

The growing scale and complexity of safety-critical control systems underscore the need to evolve current control architectures aiming for the unparalleled performances achievable through state-of-the-art optimization and machine learning…

Systems and Control · Electrical Eng. & Systems 2024-09-30 Luca Furieri , Clara Lucía Galimberti , Giancarlo Ferrari-Trecate

A Consistency-Centric Approach to Set-Based Optimization with Multiple Models of Unranked Fidelity

In complex real-world settings, optimization is challenged by the presence of diverse models of differing fidelity. In many optimization problems, a single model is treated as the most accurate representation of the underlying system, while…

Machine Learning · Statistics 2026-05-07 Danielle F. Morey , Giulia Pedrielli , Cherry Y. Wakayama , Zelda B. Zabinsky

WIMLE: Uncertainty-Aware World Models with IMLE for Sample-Efficient Continuous Control

Model-based reinforcement learning promises strong sample efficiency but often underperforms in practice due to compounding model error, unimodal world models that average over multi-modal dynamics, and overconfident predictions that bias…

Machine Learning · Computer Science 2026-04-07 Mehran Aghabozorgi , Alireza Moazeni , Yanshu Zhang , Ke Li

ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning

Instruction tuning has underscored the significant potential of large language models (LLMs) in producing more human controllable and effective outputs in various domains. In this work, we focus on the data selection problem for…

Machine Learning · Computer Science 2025-09-01 Yang Wu , Huayi Zhang , Yizheng Jiao , Lin Ma , Xiaozhong Liu , Jinhong Yu , Dongyu Zhang , Dezhi Yu , Wei Xu

SCOPE: Stochastic and Counterbiased Option Placement for Evaluating Large Language Models

Large Language Models (LLMs) can achieve inflated scores on multiple-choice tasks by exploiting inherent biases in option positions or labels, rather than demonstrating genuine understanding. This study introduces SCOPE, an evaluation…

Computation and Language · Computer Science 2025-08-05 Wonjun Jeong , Dongseok Kim , Taegkeun Whangbo

BoRP: Bootstrapped Regression Probing for Scalable and Human-Aligned LLM Evaluation

Accurate evaluation of user satisfaction is critical for iterative development of conversational AI. However, for open-ended assistants, traditional A/B testing lacks reliable metrics: explicit feedback is sparse, while implicit metrics are…

Computation and Language · Computer Science 2026-01-27 Peng Sun , Xiangyu Zhang , Duan Wu

Classified Regression for Bayesian Optimization: Robot Learning with Unknown Penalties

Learning robot controllers by minimizing a black-box objective cost using Bayesian optimization (BO) can be time-consuming and challenging. It is very often the case that some roll-outs result in failure behaviors, causing premature…

Machine Learning · Computer Science 2020-11-11 Alonso Marco , Dominik Baumann , Philipp Hennig , Sebastian Trimpe

Progress or Regress? Self-Improvement Reversal in Post-training

Self-improvement through post-training methods such as iterative preference learning has been acclaimed for enhancing the problem-solving capabilities (e.g., mathematical reasoning) of Large Language Models (LLMs) without human…

Computation and Language · Computer Science 2024-07-09 Ting Wu , Xuefeng Li , Pengfei Liu

Models Know Models Best: Evaluation via Model-Preferred Formats

Performance of Large Language Models (LLMs) on multiple-choice tasks differs markedly between symbol-based and cloze-style evaluation formats. The observed discrepancies are systematically attributable to task characteristics: natural…

Computation and Language · Computer Science 2026-02-02 Joonhak Lee , Sungmok Jung , Jongyeon Park , Jaejin Lee

SCORE: Systematic COnsistency and Robustness Evaluation for Large Language Models

Typical evaluations of Large Language Models (LLMs) report a single metric per dataset, often representing the model's best-case performance under carefully selected settings. Unfortunately, this approach overlooks model robustness and…

Computation and Language · Computer Science 2025-03-04 Grigor Nalbandyan , Rima Shahbazyan , Evelina Bakhturina

Bayes Optimal Informer Sets for Early-Stage Drug Discovery

An important experimental design problem in early-stage drug discovery is how to prioritize available compounds for testing when very little is known about the target protein. Informer based ranking (IBR) methods address the prioritization…

Methodology · Statistics 2023-06-26 Peng Yu , Spencer S. Ericksen , Anthony Gitter , Michael A. Newton

Investigating Ensemble Methods for Model Robustness Improvement of Text Classifiers

Large pre-trained language models have shown remarkable performance over the past few years. These models, however, sometimes learn superficial features from the dataset and cannot generalize to the distributions that are dissimilar to the…

Computation and Language · Computer Science 2022-10-31 Jieyu Zhao , Xuezhi Wang , Yao Qin , Jilin Chen , Kai-Wei Chang

Learning stabilising policies for constrained nonlinear systems

This work proposes a two-layered control scheme for constrained nonlinear systems represented by a class of recurrent neural networks and affected by additive disturbances. In particular, a base controller ensures global or regional…

Systems and Control · Electrical Eng. & Systems 2026-03-27 Daniele Ravasio , Danilo Saccani , Marcello Farina , Giancarlo Ferrari-Trecate

Consistent Prompting for Rehearsal-Free Continual Learning

Continual learning empowers models to adapt autonomously to the ever-changing environment or data streams without forgetting old knowledge. Prompt-based approaches are built on frozen pre-trained models to learn the task-specific prompts…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Zhanxin Gao , Jun Cen , Xiaobin Chang

Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency

Noise Contrastive Estimation (NCE) is a powerful parameter estimation method for log-linear models, which avoids calculation of the partition function or its derivatives at each training step, a computationally demanding step in many cases.…

Computation and Language · Computer Science 2018-09-07 Zhuang Ma , Michael Collins

Model-Based Reinforcement Learning via Meta-Policy Optimization

Model-based reinforcement learning approaches carry the promise of being data efficient. However, due to challenges in learning dynamics models that sufficiently match the real-world dynamics, they struggle to achieve the same asymptotic…

Machine Learning · Computer Science 2018-09-17 Ignasi Clavera , Jonas Rothfuss , John Schulman , Yasuhiro Fujita , Tamim Asfour , Pieter Abbeel