机器学习 — Scifaro

High-Order Error Bounds for Markovian LSA with Richardson-Romberg Extrapolation

In this paper, we study the bias and high-order error bounds of the Linear Stochastic Approximation (LSA) algorithm with Polyak-Ruppert (PR) averaging under Markovian noise. We focus on the version of the algorithm with constant step size…

机器学习 · 统计学 2025-08-08 Ilya Levin , Alexey Naumov , Sergey Samsonov

High-Dimensional Differentially Private Quantile Regression: Distributed Estimation and Statistical Inference

With the development of big data and machine learning, privacy concerns have become increasingly critical, especially when handling heterogeneous datasets containing sensitive personal information. Differential privacy provides a rigorous…

机器学习 · 统计学 2025-08-08 Ziliang Shen , Caixing Wang , Shaoli Wang , Yibo Yan

Differentially Private Model-X Knockoffs via Johnson-Lindenstrauss Transform

We introduce a novel privatization framework for high-dimensional controlled variable selection. Our framework enables rigorous False Discovery Rate (FDR) control under differential privacy constraints. While the Model-X knockoff procedure…

机器学习 · 统计学 2025-08-08 Yuxuan Tao , Adel Javanmard

Efficient optimization of expensive black-box simulators via marginal means, with application to neutrino detector design

With advances in scientific computing, computer experiments are increasingly used for optimizing complex systems. However, for modern applications, e.g., the optimization of nuclear physics detectors, each experiment run can require…

机器学习 · 统计学 2025-08-08 Hwanwoo Kim , Simon Mak , Ann-Kathrin Schuetz , Alan Poon

A Stein Gradient Descent Approach for Doubly Intractable Distributions

Bayesian inference for doubly intractable distributions is challenging because they include intractable terms, which are functions of parameters of interest. Although several alternatives have been developed for such models, they are…

机器学习 · 统计学 2025-08-08 Heesang Lee , Songhee Kim , Bokgyeong Kang , Jaewoo Park

Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints

Orthogonality constraints naturally appear in many machine learning problems, from principal component analysis to robust neural network training. They are usually solved using Riemannian optimization algorithms, which minimize the…

机器学习 · 统计学 2025-08-08 Pierre Ablin , Simon Vary , Bin Gao , P. -A. Absil

Metric Learning in an RKHS

Metric learning from a set of triplet comparisons in the form of "Do you think item h is more similar to item i or item j?", indicating similarity and differences between items, plays a key role in various applications including image…

机器学习 · 统计学 2025-08-07 Gokcan Tatli , Yi Chen , Blake Mason , Robert Nowak , Ramya Korlakai Vinayak

Benchmarking Uncertainty and its Disentanglement in multi-label Chest X-Ray Classification

Reliable uncertainty quantification is crucial for trustworthy decision-making and the deployment of AI models in medical imaging. While prior work has explored the ability of neural networks to quantify predictive, epistemic, and aleatoric…

机器学习 · 统计学 2025-08-07 Simon Baur , Wojciech Samek , Jackie Ma

Deep Neural Network-Driven Adaptive Filtering

This paper proposes a deep neural network (DNN)-driven framework to address the longstanding generalization challenge in adaptive filtering (AF). In contrast to traditional AF frameworks that emphasize explicit cost function design, the…

机器学习 · 统计学 2025-08-07 Qizhen Wang , Gang Wang , Ying-Chang Liang

Negative binomial regression and inference using a pre-trained transformer

Negative binomial regression is essential for analyzing over-dispersed count data in in comparative studies, but parameter estimation becomes computationally challenging in large screens requiring millions of comparisons. We investigate…

机器学习 · 统计学 2025-08-07 Valentine Svensson

Reinforcement Learning in MDPs with Information-Ordered Policies

We propose an epoch-based reinforcement learning algorithm for infinite-horizon average-cost Markov decision processes (MDPs) that leverages a partial order over a policy class. In this structure, $\pi' \leq \pi$ if data collected under…

机器学习 · 统计学 2025-08-07 Zhongjun Zhang , Shipra Agrawal , Ilan Lobel , Sean R. Sinclair , Christina Lee Yu

Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities

The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that…

机器学习 · 统计学 2025-08-07 Verónica Álvarez , Santiago Mazuelas , Steven An , Sanjoy Dasgupta

On Measuring Intrinsic Causal Attributions in Deep Neural Networks

Quantifying the causal influence of input features within neural networks has become a topic of increasing interest. Existing approaches typically assess direct, indirect, and total causal effects. This work treats NNs as structural causal…

机器学习 · 统计学 2025-08-07 Saptarshi Saha , Dhruv Vansraj Rathore , Soumadeep Saha , Utpal Garain , David Doermann

Deep Discrete Encoders: Identifiable Deep Generative Models for Rich Data with Discrete Latent Layers

In the era of generative AI, deep generative models (DGMs) with latent representations have gained tremendous popularity. Despite their impressive empirical performance, the statistical properties of these models remain underexplored. DGMs…

机器学习 · 统计学 2025-08-07 Seunghyun Lee , Yuqi Gu

Exponentially Consistent Nonparametric Linkage-Based Clustering of Data Sequences

In this paper, we consider nonparametric clustering of $M$ independent and identically distributed (i.i.d.) data sequences generated from {\em unknown} distributions. The distributions of the $M$ data sequences belong to $K$ underlying…

机器学习 · 统计学 2025-08-07 Bhupender Singh , Ananth Ram Rajagopalan , Srikrishna Bhashyam

Generating Accurate Synthetic Survival Data by Conditioning on Outcomes

Synthetically generated data can improve privacy, fairness, and data accessibility; however, it can be challenging in specialized scenarios such as survival analysis. One key challenge in this setting is censoring, i.e., the timing of an…

机器学习 · 统计学 2025-08-07 Mohd Ashhad , Ricardo Henao

Optimal Learning via Moderate Deviations Theory

This paper proposes a statistically optimal approach for learning a function value using a confidence interval in a wide range of models, including general non-parametric estimation of an expected loss described as a stochastic programming…

机器学习 · 统计学 2025-08-07 Arnab Ganguly , Tobias Sutter

A Dual Optimization View to Empirical Risk Minimization with f-Divergence Regularization

The dual formulation of empirical risk minimization with f-divergence regularization (ERM-fDR) is introduced. The solution of the dual optimization problem to the ERM-fDR is connected to the notion of normalization function introduced as an…

机器学习 · 统计学 2025-08-06 Francisco Daunas , Iñaki Esnaola , Samir M. Perlaza

funOCLUST: Clustering Functional Data with Outliers

Functional data present unique challenges for clustering due to their infinite-dimensional nature and potential sensitivity to outliers. An extension of the OCLUST algorithm to the functional setting is proposed to address these issues. The…

机器学习 · 统计学 2025-08-06 Katharine M. Clark , Paul D. McNicholas

TaylorPODA: A Taylor Expansion-Based Method to Improve Post-Hoc Attributions for Opaque Models

Existing post-hoc model-agnostic methods generate external explanations for opaque models, primarily by locally attributing the model output to its input features. However, they often lack an explicit and systematic framework for…

机器学习 · 统计学 2025-08-06 Yuchi Tang , Iñaki Esnaola , George Panoutsos