机器学习 — Scifaro

Generalized probabilistic canonical correlation analysis for multi-modal data integration with full or partial observations

Background: The integration and analysis of multi-modal data are increasingly essential across various domains including bioinformatics. As the volume and complexity of such data grow, there is a pressing need for computational models that…

机器学习 · 统计学 2025-04-17 Tianjian Yang , Wei Vivian Li

Normalizing Flow Regression for Bayesian Inference with Offline Likelihood Evaluations

Bayesian inference with computationally expensive likelihood evaluations remains a significant challenge in many scientific domains. We propose normalizing flow regression (NFR), a novel offline inference method for approximating posterior…

机器学习 · 统计学 2025-04-17 Chengkun Li , Bobby Huggins , Petrus Mikkola , Luigi Acerbi

Forest Proximities for Time Series

RF-GAP has recently been introduced as an improved random forest proximity measure. In this paper, we present PF-GAP, an extension of RF-GAP proximities to proximity forests, an accurate and efficient time series classification model. We…

机器学习 · 统计学 2025-04-17 Ben Shaw , Jake Rhodes , Soukaina Filali Boubrahimi , Kevin R. Moon

Measuring training variability from stochastic optimization using robust nonparametric testing

Deep neural network training often involves stochastic optimization, meaning each run will produce a different model. This implies that hyperparameters of the training process, such as the random seed itself, can potentially have…

机器学习 · 统计学 2025-04-17 Sinjini Banerjee , Tim Marrinan , Reilly Cannon , Tony Chiang , Anand D. Sarwate

On the Robustness of Cross-Concentrated Sampling for Matrix Completion

Matrix completion is one of the crucial tools in modern data science research. Recently, a novel sampling model for matrix completion coined cross-concentrated sampling (CCS) has caught much attention. However, the robustness of the CCS…

机器学习 · 统计学 2025-04-17 HanQin Cai , Longxiu Huang , Chandra Kundu , Bowen Su

Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks

We revisit online binary classification by shifting the focus from competing with the best-in-class binary loss to competing against relaxed benchmarks that capture smoothed notions of optimality. Instead of measuring regret relative to the…

机器学习 · 统计学 2025-04-16 Omar Montasser , Abhishek Shetty , Nikita Zhivotovskiy

AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse

Diffusion models have demonstrated remarkable success in generative tasks, yet their iterative denoising process results in slow inference, limiting their practicality. While existing acceleration methods exploit the well-known U-shaped…

机器学习 · 统计学 2025-04-16 Zichao Yu , Zhen Zou , Guojiang Shao , Chengwei Zhang , Shengze Xu , Jie Huang , Feng Zhao , Xiaodong Cun , Wenyi Zhang

A Piecewise Lyapunov Analysis of Sub-quadratic SGD: Applications to Robust and Quantile Regression

Motivated by robust and quantile regression problems, we investigate the stochastic gradient descent (SGD) algorithm for minimizing an objective function $f$ that is locally strongly convex with a sub--quadratic tail. This setting covers…

机器学习 · 统计学 2025-04-16 Yixuan Zhang , Dongyan Huo , Yudong Chen , Qiaomin Xie

Posterior and variational inference for deep neural networks with heavy-tailed weights

We consider deep neural networks in a Bayesian framework with a prior distribution sampling the network weights at random. Following a recent idea of Agapiou and Castillo (2023), who show that heavy-tailed prior distributions achieve…

机器学习 · 统计学 2025-04-16 Ismaël Castillo , Paul Egels

Identifiable Deep Generative Models via Sparse Decoding

We develop the sparse VAE for unsupervised representation learning on high-dimensional data. The sparse VAE learns a set of latent factors (representations) which summarize the associations in the observed data features. The underlying…

机器学习 · 统计学 2025-04-16 Gemma E. Moran , Dhanya Sridhar , Yixin Wang , David M. Blei

Optimal sparse phase retrieval via a quasi-Bayesian approach

This paper addresses the problem of sparse phase retrieval, a fundamental inverse problem in applied mathematics, physics, and engineering, where a signal need to be reconstructed using only the magnitude of its transformation while phase…

机器学习 · 统计学 2025-04-15 The Tien Mai

Dose-finding design based on level set estimation in phase I cancer clinical trials

The primary objective of phase I cancer clinical trials is to evaluate the safety of a new experimental treatment and to find the maximum tolerated dose (MTD). We show that the MTD estimation problem can be regarded as a level set…

机器学习 · 统计学 2025-04-15 Keiichiro Seno , Kota Matsui , Shogo Iwazaki , Yu Inatsu , Shion Takeno , Shigeyuki Matsui

An Incremental Non-Linear Manifold Approximation Method

Analyzing high-dimensional data presents challenges due to the "curse of dimensionality'', making computations intensive. Dimension reduction techniques, categorized as linear or non-linear, simplify such data. Non-linear methods are…

机器学习 · 统计学 2025-04-15 Praveen T. W. Hettige , Benjamin W. Ong

Improving the evaluation of samplers on multi-modal targets

Addressing multi-modality constitutes one of the major challenges of sampling. In this reflection paper, we advocate for a more systematic evaluation of samplers towards two sources of difficulty that are mode separation and dimension. For…

机器学习 · 统计学 2025-04-15 Louis Grenioux , Maxence Noble , Marylou Gabrié

Double Machine Learning for Causal Inference under Shared-State Interference

Researchers and practitioners often wish to measure treatment effects in settings where units interact via markets and recommendation systems. In these settings, units are affected by certain shared states, like prices, algorithmic…

机器学习 · 统计学 2025-04-15 Chris Hays , Manish Raghavan

Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning

In dynamic decision-making scenarios across business and healthcare, leveraging sample trajectories from diverse populations can significantly enhance reinforcement learning (RL) performance for specific target populations, especially when…

机器学习 · 统计学 2025-04-15 Jinhang Chai , Elynn Chen , Jianqing Fan

Causal machine learning for heterogeneous treatment effects in the presence of missing outcome data

When estimating heterogeneous treatment effects, missing outcome data can complicate treatment effect estimation, causing certain subgroups of the population to be poorly represented. In this work, we discuss this commonly overlooked…

机器学习 · 统计学 2025-04-15 Matthew Pryce , Karla Diaz-Ordaz , Ruth H. Keogh , Stijn Vansteelandt

Towards safe Bayesian optimization with Wiener kernel regression

Bayesian Optimization (BO) is a data-driven strategy for minimizing/maximizing black-box functions based on probabilistic surrogate models. In the presence of safety constraints, the performance of BO crucially relies on tight probabilistic…

机器学习 · 统计学 2025-04-15 Oleksii Molodchyk , Johannes Teutsch , Timm Faulwasser

Learned Reference-based Diffusion Sampling for multi-modal distributions

Over the past few years, several approaches utilizing score-based diffusion have been proposed to sample from probability distributions, that is without having access to exact samples and relying solely on evaluations of unnormalized…

机器学习 · 统计学 2025-04-15 Maxence Noble , Louis Grenioux , Marylou Gabrié , Alain Oliviero Durmus

Robust Barycenter Estimation using Semi-Unbalanced Neural Optimal Transport

Aggregating data from multiple sources can be formalized as an Optimal Transport (OT) barycenter problem, which seeks to compute the average of probability distributions with respect to OT discrepancies. However, in real-world scenarios,…

机器学习 · 统计学 2025-04-15 Milena Gazdieva , Jaemoo Choi , Alexander Kolesov , Jaewoong Choi , Petr Mokrov , Alexander Korotin