机器学习 — Scifaro

Dual-Directed Algorithm Design for Efficient Pure Exploration

While experimental design often focuses on selecting the single best alternative from a finite set (e.g., in ranking and selection or best-arm identification), many pure-exploration problems pursue richer goals. Given a specific goal,…

机器学习 · 统计学 2025-05-28 Chao Qin , Wei You

Learning with Selectively Labeled Data from Multiple Decision-makers

We study the problem of classification with selectively labeled data, whose distribution may differ from the full population due to historical decision-making. We exploit the fact that in many applications historical decisions were made by…

机器学习 · 统计学 2025-05-28 Jian Chen , Zhehao Li , Xiaojie Mao

No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference

Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic "free lunch" for PPI++, an adaptive form of PPI,…

机器学习 · 统计学 2025-05-27 Pranav Mani , Peng Xu , Zachary C. Lipton , Michael Oberst

Weighted Leave-One-Out Cross Validation

We present a weighted version of Leave-One-Out (LOO) cross-validation for estimating the Integrated Squared Error (ISE) when approximating an unknown function by a predictor that depends linearly on evaluations of the function over a finite…

机器学习 · 统计学 2025-05-27 Luc Pronzato , Maria-João Rendas

PIGPVAE: Physics-Informed Gaussian Process Variational Autoencoders

Recent advances in generative AI offer promising solutions for synthetic data generation but often rely on large datasets for effective training. To address this limitation, we propose a novel generative model that learns from limited data…

机器学习 · 统计学 2025-05-27 Michail Spitieris , Massimiliano Ruocco , Abdulmajid Murad , Alessandro Nocente

Statistical inference for Linear Stochastic Approximation with Markovian Noise

In this paper we derive non-asymptotic Berry-Esseen bounds for Polyak-Ruppert averaged iterates of the Linear Stochastic Approximation (LSA) algorithm driven by the Markovian noise. Our analysis yields $\mathcal{O}(n^{-1/4})$ convergence…

机器学习 · 统计学 2025-05-27 Sergey Samsonov , Marina Sheshukova , Eric Moulines , Alexey Naumov

Optimal Conformal Prediction under Epistemic Uncertainty

Conformal prediction (CP) is a popular frequentist framework for representing uncertainty by providing prediction sets that guarantee coverage of the true label with a user-adjustable probability. In most applications, CP operates on…

机器学习 · 统计学 2025-05-27 Alireza Javanmardi , Soroush H. Zargarbashi , Santo M. A. R. Thies , Willem Waegeman , Aleksandar Bojchevski , Eyke Hüllermeier

On the Role of Label Noise in the Feature Learning Process

Deep learning with noisy labels presents significant challenges. In this work, we theoretically characterize the role of label noise from a feature learning perspective. Specifically, we consider a signal-noise data distribution, where each…

机器学习 · 统计学 2025-05-27 Andi Han , Wei Huang , Zhanpeng Zhou , Gang Niu , Wuyang Chen , Junchi Yan , Akiko Takeda , Taiji Suzuki

Marginal Fairness: Fair Decision-Making under Risk Measures

This paper introduces marginal fairness, a new individual fairness notion for equitable decision-making in the presence of protected attributes such as gender, race, and religion. This criterion ensures that decisions based on generalized…

机器学习 · 统计学 2025-05-27 Fei Huang , Silvana M. Pesenti

LocalKMeans: Convergence of Lloyd's Algorithm with Distributed Local Iterations

In this paper, we analyze the classical $K$-means alternating-minimization algorithm, also known as Lloyd's algorithm (Lloyd, 1956), for a mixture of Gaussians in a data-distributed setting that incorporates local iteration steps. Assuming…

机器学习 · 统计学 2025-05-27 Harsh Vardhan , Heng Zhu , Avishek Ghosh , Arya Mazumdar

On the Mechanisms of Weak-to-Strong Generalization: A Theoretical Perspective

Weak-to-strong generalization, where a student model trained on imperfect labels generated by a weaker teacher nonetheless surpasses that teacher, has been widely observed but the mechanisms that enable it have remained poorly understood.…

机器学习 · 统计学 2025-05-27 Behrad Moniri , Hamed Hassani

Online Statistical Inference of Constrained Stochastic Optimization via Random Scaling

Constrained stochastic nonlinear optimization problems have attracted significant attention for their ability to model complex real-world scenarios in physics, economics, and biology. As datasets continue to grow, online inference methods…

机器学习 · 统计学 2025-05-27 Xinchen Du , Wanrong Zhu , Wei Biao Wu , Sen Na

Adaptive Sensor Steering Strategy Using Deep Reinforcement Learning for Dynamic Data Acquisition in Digital Twins

This paper introduces a sensor steering methodology based on deep reinforcement learning to enhance the predictive accuracy and decision support capabilities of digital twins by optimising the data acquisition process. Traditional sensor…

机器学习 · 统计学 2025-05-27 Collins O. Ogbodo , Timothy J. Rogers , Mattia Dal Borgo , David J. Wagg

Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)

In physics, complex systems are often simplified into minimal, solvable models that retain only the core principles. In machine learning, layerwise linear models (e.g., linear neural networks) act as simplified representations of neural…

机器学习 · 统计学 2025-05-27 Yoonsoo Nam , Seok Hyeong Lee , Clementine C J Domine , Yeachan Park , Charles London , Wonyl Choi , Niclas Goring , Seungjai Lee

Statistical Collusion by Collectives on Learning Platforms

As platforms increasingly rely on learning algorithms, collectives may form and seek ways to influence these platforms to align with their own interests. This can be achieved by coordinated submission of altered data. To evaluate the…

机器学习 · 统计学 2025-05-27 Etienne Gauthier , Francis Bach , Michael I. Jordan

Prediction-Powered E-Values

Quality statistical inference requires a sufficient amount of data, which can be missing or hard to obtain. To this end, prediction-powered inference has risen as a promising methodology, but existing approaches are largely limited to…

机器学习 · 统计学 2025-05-27 Daniel Csillag , Claudio José Struchiner , Guilherme Tegoni Goedert

Change Point Detection in the Frequency Domain with Statistical Reliability

Effective condition monitoring in complex systems requires identifying change points (CPs) in the frequency domain, as the structural changes often arise across multiple frequencies. This paper extends recent advancements in statistically…

机器学习 · 统计学 2025-05-27 Akifumi Yamada , Tomohiro Shiraishi , Shuichi Nishino , Teruyuki Katsuoka , Kouichi Taji , Ichiro Takeuchi

Optimal Transport Barycenter via Nonconvex-Concave Minimax Optimization

The optimal transport barycenter (a.k.a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the unregularized barycenter…

机器学习 · 统计学 2025-05-27 Kaheon Kim , Rentian Yao , Changbo Zhu , Xiaohui Chen

Variance-Reduced Cascade Q-learning: Algorithms and Sample Complexity

We study the problem of estimating the optimal Q-function of $\gamma$-discounted Markov decision processes (MDPs) under the synchronous setting, where independent samples for all state-action pairs are drawn from a generative model at each…

机器学习 · 统计学 2025-05-27 Mohammad Boveiri , Peyman Mohajerin Esfahani

Operator-Informed Score Matching for Markov Diffusion Models

Diffusion models are typically trained using score matching, a learning objective agnostic to the underlying noising process that guides the model. This paper argues that Markov noising processes enjoy an advantage over alternatives, as the…

机器学习 · 统计学 2025-05-27 Zheyang Shen , Huihui Wang , Marina Riabiz , Chris J. Oates