机器学习 — Scifaro

Asymptotically Log-Optimal Bayes-Assisted Confidence Sequences for Bounded Means

Confidence sequences based on test martingales provide time-uniform uncertainty quantification for the mean of bounded IID observations without parametric distributional assumptions. Their practical efficiency, however, depends strongly on…

机器学习 · 统计学 2026-05-12 Valentin Kilian , Stefano Cortinovis , François Caron

Fourier Feature Methods for Nonlinear Causal Discovery: FFML Scoring, TRFF Scoring, and FFCI Testing in Mixed Data

Gaussian process (GP) marginal likelihood scores and kernel conditional independence tests are theoretically appealing for nonlinear causal discovery but computationally prohibitive at scale. We present three complementary RFF-based methods…

机器学习 · 统计学 2026-05-12 Joseph D. Ramsey

Spherical Flows for Sampling Categorical Data

We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb…

机器学习 · 统计学 2026-05-12 Jannis Chemseddine , Gregor Kornhardt , Gabriele Steidl

Multiscale Euclidean Network Trajectories: Second-Moment Geometry, Attribution, and Change Points

A central challenge in dynamic network analysis is to represent temporal evolution in a way that is both geometrically meaningful and statistically identifiable. One approach embeds a sequence of network snapshots as trajectories in a…

机器学习 · 统计学 2026-05-12 Haruka Ezoe , Ryohei Hisano

Stochastic Schr\"odinger Diffusion Models for Pure-State Ensemble Generation

In quantum machine learning (QML), classical data are often encoded as quantum pure states and processed directly as quantum representations, motivating representation-level generative modeling that samples new quantum states from an…

机器学习 · 统计学 2026-05-12 Jian Xu , Wei Chen , Shigui Li , Chao Li , Jingyuan Zheng , Delu Zeng , John Paisley , Qibin Zhao

Learning When to Trust LLM Priors: A Validated Framework for Semantic Prior Integration

Large language models (LLMs) encode rich semantic knowledge that can be useful for supervised learning, but their outputs are unreliable as statistical priors: they may be noisy, misspecified, or hallucinated. Existing LLM-informed learning…

机器学习 · 统计学 2026-05-12 Erica Zhang , Naomi Sagan , Danny Tse , Fangzhao Zhang , Mert Pilanci , Jose Blanchet

Efficient Evaluation of LLM Performance with Statistical Guarantees

Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model…

机器学习 · 统计学 2026-05-12 Skyler Wu , Yash Nair , Emmanuel J. Candès

Optimal Attention Temperature Improves the Robustness of In-Context Learning under Distribution Shift in High Dimensions

Pretrained Transformers can perform in-context learning (ICL) from a few demonstrations, but this ability can fail sharply when the test distribution differs from pretraining, a common deployment setting. We study attention temperature as a…

机器学习 · 统计学 2026-05-12 Samet Demir , Zafer Dogan

An Interdisciplinary and Cross-Task Review on Missing Data Imputation

Missing data is a fundamental challenge in data science, significantly hindering analysis and decision-making across a wide range of disciplines, including healthcare, bioinformatics, social science, e-commerce, and industrial monitoring.…

机器学习 · 统计学 2026-05-12 Jicong Fan

Debiased Front-Door Learners for Heterogeneous Effects

In observational settings where treatment and outcome share unmeasured confounders but an observed mediator remains unconfounded, the front-door (FD) adjustment identifies causal effects through the mediator. We study the heterogeneous…

机器学习 · 统计学 2026-05-12 Yonghan Jung

Estimating Heterogeneous Causal Effect on Networks via Orthogonal Learning

Estimating causal effects on networks is challenging because treatments may affect both treated units and their neighbors, while network homophily induces dependence and confounding. These challenges are amplified when causal effects are…

机器学习 · 统计学 2026-05-12 Yuanchen Wu , Yubai Yuan

Sliding Window Informative Canonical Correlation Analysis

Canonical correlation analysis (CCA) is a technique for finding correlated sets of features between two datasets. In this paper, we propose a novel extension of CCA to the online, streaming data setting: Sliding Window Informative Canonical…

机器学习 · 统计学 2026-05-12 Arvind Prasadan

Active Learning for Manifold Gaussian Process Regression

This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a…

机器学习 · 统计学 2026-05-12 Yuanxing Cheng , Lulu Kang , Yiwei Wang , Chun Liu

Liouville PDE-based sliced-Wasserstein flow

The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is transformed into a Liouville partial differential equation (PDE)-based formalism. First, the stochastic diffusive term from the Fokker-Planck…

机器学习 · 统计学 2026-05-12 Jayshawn Cooper , Pilhwa Lee

Post-detection inference for sequential changepoint localization

This paper addresses a fundamental but largely unexplored challenge in sequential changepoint analysis: conducting inference following a detected change. We develop a very general framework to construct confidence sets for the unknown…

机器学习 · 统计学 2026-05-12 Aytijhya Saha , Aaditya Ramdas

Differentially Private Hyperparameter Tuning using Local Bayesian Optimization

Hyperparameter tuning is a key component of machine learning procedures, but when validation data contain sensitive user information, search mechanisms can leak private information through the selected configuration. Existing differentially…

机器学习 · 统计学 2026-05-12 Getoar Sopa , Juraj Marusic , Marco Avella Medina , John P. Cunningham

Equivariant score-based generative models provably learn distributions with symmetries efficiently

Symmetry is ubiquitous in many real-world phenomena and tasks, such as physics, images, and molecular simulations. Empirical studies have demonstrated that incorporating symmetries into generative models can provide better generalization…

机器学习 · 统计学 2026-05-12 Ziyu Chen , Markos A. Katsoulakis , Benjamin J. Zhang

Hamiltonian Monte Carlo with Asymmetrical Momentum Distributions

Existing rigorous convergence guarantees for the Hamiltonian Monte Carlo (HMC) algorithm use Gaussian auxiliary momentum variables, which are crucially symmetrically distributed. We present a novel convergence analysis for HMC utilizing new…

机器学习 · 统计学 2026-05-12 Soumyadip Ghosh , Yingdong Lu , Tomasz Nowicki

Particle-based Energetic Variational Inference

We introduce a new variational inference (VI) framework, called energetic variational inference (EVI). It minimizes the VI objective function based on a prescribed energy-dissipation law. Using the EVI framework, we can derive many existing…

机器学习 · 统计学 2026-05-12 Yiwei Wang , Jiuhai Chen , Chun Liu , Lulu Kang

A Note on Non-Negative $L_1$-Approximating Polynomials

$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative}…

机器学习 · 统计学 2026-05-11 Jane H. Lee , Anay Mehrotra , Manolis Zampetakis