机器学习 — Scifaro

Structure-agnostic Optimality of Doubly Robust Learning for Treatment Effect Estimation

Average treatment effect estimation is the most central problem in causal inference with application to numerous disciplines. While many estimation strategies have been proposed in the literature, the statistical optimality of these methods…

机器学习 · 统计学 2025-06-10 Jikai Jin , Vasilis Syrgkanis

On Hypothesis Transfer Learning of Functional Linear Models

We study the transfer learning (TL) for the functional linear regression (FLR) under the Reproducing Kernel Hilbert Space (RKHS) framework, observing that the TL techniques in existing high-dimensional linear regression are not compatible…

机器学习 · 统计学 2025-06-10 Haotian Lin , Matthew Reimherr

Counterfactual inference in sequential experiments

We consider after-study statistical inference for sequentially designed experiments wherein multiple units are assigned treatments for multiple time points using treatment policies that adapt over time. Our goal is to provide inference…

机器学习 · 统计学 2025-06-10 Raaz Dwivedi , Katherine Tian , Sabina Tomkins , Predrag Klasnja , Susan Murphy , Devavrat Shah

Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants

This paper investigates causal effect identification in latent variable Linear Non-Gaussian Acyclic Models (lvLiNGAM) using higher-order cumulants, addressing two prominent setups that are challenging in the presence of latent confounding:…

机器学习 · 统计学 2025-06-09 Daniele Tramontano , Yaroslav Kivva , Saber Salehkaleybar , Mathias Drton , Negar Kiyavash

Conditioning Diffusions Using Malliavin Calculus

In generative modelling and stochastic optimal control, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using…

机器学习 · 统计学 2025-06-09 Jakiw Pidstrigach , Elizabeth Baker , Carles Domingo-Enrich , George Deligiannidis , Nikolas Nüsken

Estimating stationary mass, frequency by frequency

Suppose we observe a trajectory of length $n$ from an exponentially $\alpha$-mixing stochastic process over a finite but potentially large state space. We consider the problem of estimating the probability mass placed by the stationary…

机器学习 · 统计学 2025-06-09 Milind Nakul , Vidya Muthukumar , Ashwin Pananjady

Ab Initio Nonparametric Variable Selection for Scalable Symbolic Regression with Large $p$

Symbolic regression (SR) is a powerful technique for discovering symbolic expressions that characterize nonlinear relationships in data, gaining increasing attention for its interpretability, compactness, and robustness. However, existing…

机器学习 · 统计学 2025-06-09 Shengbin Ye , Meng Li

Distributional Matrix Completion via Nearest Neighbors in the Wasserstein Space

We study the problem of distributional matrix completion: Given a sparsely observed matrix of empirical distributions, we seek to impute the true distributions associated with both observed and unobserved matrix entries. This is a…

机器学习 · 统计学 2025-06-09 Jacob Feitelberg , Kyuseong Choi , Anish Agarwal , Raaz Dwivedi

Deconfounding Multi-Cause Latent Confounders: A Factor-Model Approach to Climate Model Bias Correction

Global Climate Models (GCMs) are crucial for predicting future climate changes by simulating the Earth systems. However, the GCM Outputs exhibit systematic biases due to model uncertainties, parameterization simplifications, and inadequate…

机器学习 · 统计学 2025-06-09 Wentao Gao , Jiuyong Li , Debo Cheng , Lin Liu , Jixue Liu , Thuc Duy Le , Xiaojing Du , Xiongren Chen , Yanchang Zhao , Yun Chen

BOLD: Boolean Logic Deep Learning

Deep learning is computationally intensive, with significant efforts focused on reducing arithmetic complexity, particularly regarding energy consumption dominated by data movement. While existing literature emphasizes inference, training…

机器学习 · 统计学 2025-06-09 Van Minh Nguyen , Cristian Ocampo , Aymen Askri , Louis Leconte , Ba-Hien Tran

Longitudinal Targeted Minimum Loss-based Estimation with Temporal-Difference Heterogeneous Transformer

We propose Deep Longitudinal Targeted Minimum Loss-based Estimation (Deep LTMLE), a novel approach to estimate the counterfactual mean of outcome under dynamic treatment policies in longitudinal problem settings. Our approach utilizes a…

机器学习 · 统计学 2025-06-09 Toru Shirakawa , Yi Li , Yulun Wu , Sky Qiu , Yuxuan Li , Mingduo Zhao , Hiroyasu Iso , Mark van der Laan

Infinite-Dimensional Diffusion Models

Diffusion models have had a profound impact on many application areas, including those where data are intrinsically infinite-dimensional, such as images or time series. The standard approach is first to discretize and then to apply…

机器学习 · 统计学 2025-06-09 Jakiw Pidstrigach , Youssef Marzouk , Sebastian Reich , Sven Wang

Nuclear penalized multinomial regression with an application to predicting at bat outcomes in baseball

We propose the nuclear norm penalty as an alternative to the ridge penalty for regularized multinomial regression. This convex relaxation of reduced-rank multinomial regression has the advantage of leveraging underlying structure among the…

机器学习 · 统计学 2025-06-09 Scott Powers , Trevor Hastie , Robert Tibshirani

Admissibility of Completely Randomized Trials: A Large-Deviation Approach

When an experimenter has the option of running an adaptive trial, is it admissible to ignore this option and run a non-adaptive trial instead? We provide a negative answer to this question in the best-arm identification problem, where the…

机器学习 · 统计学 2025-06-06 Guido Imbens , Chao Qin , Stefan Wager

Nonlinear Causal Discovery for Grouped Data

Inferring cause-effect relationships from observational data has gained significant attention in recent years, but most methods are limited to scalar random variables. In many important domains, including neuroscience, psychology, social…

机器学习 · 统计学 2025-06-06 Konstantin Göbler , Tobias Windisch , Mathias Drton

Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models

Estimating causal effects of joint interventions on multiple variables is crucial in many domains, but obtaining data from such simultaneous interventions can be challenging. Our study explores how to learn joint interventional effects…

机器学习 · 统计学 2025-06-06 Armin Kekić , Sergio Hernan Garrido Mejia , Bernhard Schölkopf

Distributional encoding for Gaussian process regression with qualitative inputs

Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing…

机器学习 · 统计学 2025-06-06 Sébastien Da Veiga

On the Wasserstein Geodesic Principal Component Analysis of probability measures

This paper focuses on Geodesic Principal Component Analysis (GPCA) on a collection of probability distributions using the Otto-Wasserstein geometry. The goal is to identify geodesic curves in the space of probability measures that best…

机器学习 · 统计学 2025-06-06 Nina Vesseron , Elsa Cazelles , Alice Le Brigant , Thierry Klein

Higher-Order Group Synchronization

Group synchronization is the problem of determining reliable global estimates from noisy local measurements on networks. The typical task for group synchronization is to assign elements of a group to the nodes of a graph in a way that…

机器学习 · 统计学 2025-06-06 Adriana L. Duncan , Joe Kileel

Combinatorial Reinforcement Learning with Preference Feedback

In this paper, we consider combinatorial reinforcement learning with preference feedback, where a learning agent sequentially offers an action--an assortment of multiple items to--a user, whose preference feedback follows a multinomial…

机器学习 · 统计学 2025-06-06 Joongkyu Lee , Min-hwan Oh