机器学习 — Scifaro

Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms

This paper investigates the partial linear model by Least Absolute Deviation (LAD) regression. We parameterize the nonparametric term using Deep Neural Networks (DNNs) and formulate a penalized LAD problem for estimation. Specifically, our…

机器学习 · 统计学 2025-11-27 Lechen Feng , Haoran Li , Lucky Li , Xingqiu Zhao

Demystifying Spectral Feature Learning for Instrumental Variable Regression

We address the problem of causal effect estimation in the presence of hidden confounders, using nonparametric instrumental variable (IV) regression. A leading strategy employs spectral features - that is, learned features spanning the top…

机器学习 · 统计学 2025-11-27 Dimitri Meunier , Antoine Moulin , Jakub Wornbard , Vladimir R. Kostic , Arthur Gretton

Momentum Multi-Marginal Schr\"odinger Bridge Matching

Understanding complex systems by inferring trajectories from sparse sample snapshots is a fundamental challenge in a wide range of domains, e.g., single-cell biology, meteorology, and economics. Despite advancements in Bridge and Flow…

机器学习 · 统计学 2025-11-27 Panagiotis Theodoropoulos , Augustinos D. Saravanos , Evangelos A. Theodorou , Guan-Horng Liu

Factor-Assisted Federated Learning for Personalized Optimization with Heterogeneous Data

Federated learning is an emerging distributed machine learning framework aiming at protecting data privacy. Data heterogeneity is one of the core challenges in federated learning, which could severely degrade the convergence rate and…

机器学习 · 统计学 2025-11-27 Feifei Wang , Huiyun Tang , Yang Li

A Fully Probabilistic Tensor Network for Regularized Volterra System Identification

Modeling nonlinear systems with Volterra series is challenging because the number of kernel coefficients grows exponentially with the model order. This work introduces Bayesian Tensor Network Volterra kernel machines (BTN-V), extending the…

机器学习 · 统计学 2025-11-26 Afra Kilic , Kim Batselier

Clustering Approaches for Mixed-Type Data: A Comparative Study

Clustering is widely used in unsupervised learning to find homogeneous groups of observations within a dataset. However, clustering mixed-type data remains a challenge, as few existing approaches are suited for this task. This study…

机器学习 · 统计学 2025-11-26 Badih Ghattas , Alvaro Sanchez San-Benito

Differential privacy with dependent data

Dependent data underlies many statistical studies in the social and health sciences, which often involve sensitive or private information. Differential privacy (DP) and in particular \textit{user-level} DP provide a natural formalization of…

机器学习 · 统计学 2025-11-26 Valentin Roth , Marco Avella-Medina

Learning to Validate Generative Models: a Goodness-of-Fit Approach

Generative models are increasingly central to scientific workflows, yet their systematic use and interpretation require a proper understanding of their limitations through rigorous validation. Classic approaches struggle with scalability,…

机器学习 · 统计学 2025-11-26 Pietro Cappelli , Gaia Grosso , Marco Letizia , Humberto Reyes-González , Marco Zanetti

Spectral Thresholds for Identifiability and Stability:Finite-Sample Phase Transitions in High-Dimensional Learning

In high-dimensional learning, models remain stable until they collapse abruptly once the sample size falls below a critical level. This instability is not algorithm-specific but a geometric mechanism: when the weakest Fisher eigendirection…

机器学习 · 统计学 2025-11-26 William Hao-Cheng Huang

Adjoint Schr\"odinger Bridge Sampler

Computational methods for learning to sample from the Boltzmann distribution -- where the target distribution is known only up to an unnormalized energy function -- have advanced significantly recently. Due to the lack of explicit target…

机器学习 · 统计学 2025-11-26 Guan-Horng Liu , Jaemoo Choi , Yongxin Chen , Benjamin Kurt Miller , Ricky T. Q. Chen

An Asymptotic Equation Linking WAIC and WBIC in Singular Models

In statistical learning, models are classified as regular or singular depending on whether the mapping from parameters to probability distributions is injective. Most models with hierarchical structures or latent variables are singular, for…

机器学习 · 统计学 2025-11-26 Naoki Hayashi , Takuro Kutsuna , Sawa Takamuku

Missing Data Imputation by Reducing Mutual Information with Rectified Flows

This paper introduces a novel iterative method for missing data imputation that sequentially reduces the mutual information between data and the corresponding missingness mask. Inspired by GAN-based approaches that train generators to…

机器学习 · 统计学 2025-11-26 Jiahao Yu , Qizhen Ying , Leyang Wang , Ziyue Jiang , Song Liu

Sparse Techniques for Regression in Deep Gaussian Processes

Gaussian processes (GPs) have gained popularity as flexible machine learning models for regression and function approximation with an in-built method for uncertainty quantification. However, GPs suffer when the amount of training data is…

机器学习 · 统计学 2025-11-26 Jonas Latz , Aretha L. Teckentrup , Simon Urbainczyk

Hyperparameter Optimization in Machine Learning

Hyperparameters are configuration variables controlling the behavior of machine learning algorithms. They are ubiquitous in machine learning and artificial intelligence and the choice of their values determines the effectiveness of systems…

机器学习 · 统计学 2025-11-26 Luca Franceschi , Michele Donini , Valerio Perrone , Aaron Klein , Cédric Archambeau , Matthias Seeger , Massimiliano Pontil , Paolo Frasconi

Nonparametric Instrumental Variable Regression with Observed Covariates

We study the problem of nonparametric instrumental variable regression with observed covariates, which we refer to as NPIV-O. Compared with standard nonparametric instrumental variable regression (NPIV), the additional observed covariates…

机器学习 · 统计学 2025-11-25 Zikai Shen , Zonghao Chen , Dimitri Meunier , Ingo Steinwart , Arthur Gretton , Zhu Li

The Unified Non-Convex Framework for Robust Causal Inference: Overcoming the Gaussian Barrier and Optimization Fragility

This document proposes a Unified Robust Framework that re-engineers the estimation of the Average Treatment Effect on the Overlap (ATO). It synthesizes gamma-Divergence for outlier robustness, Graduated Non-Convexity (GNC) for global…

机器学习 · 统计学 2025-11-25 Eichi Uehara

A Robust State Filter Against Unmodeled Process And Measurement Noise

This paper introduces a novel Kalman filter framework designed to achieve robust state estimation under both process and measurement noise. Inspired by the Weighted Observation Likelihood Filter (WoLF), which provides robustness against…

机器学习 · 统计学 2025-11-25 Weitao Liu

Classification EM-PCA for clustering and embedding

The mixture model is undoubtedly one of the greatest contributions to clustering. For continuous data, Gaussian models are often used and the Expectation-Maximization (EM) algorithm is particularly suitable for estimating parameters from…

机器学习 · 统计学 2025-11-25 Zineddine Tighidet , Lazhar Labiod , Mohamed Nadif

Fairness Meets Privacy: Integrating Differential Privacy and Demographic Parity in Multi-class Classification

The increasing use of machine learning in sensitive applications demands algorithms that simultaneously preserve data privacy and ensure fairness across potentially sensitive sub-populations. While privacy and fairness have each been…

机器学习 · 统计学 2025-11-25 Lilian Say , Christophe Denis , Rafael Pinot

Uncertainty of Network Topology with Applications to Out-of-Distribution Detection

Persistent homology (PH) is a crucial concept in computational topology, providing a multiscale topological description of a space. It is particularly significant in topological data analysis, which aims to make statistical inference from a…

机器学习 · 统计学 2025-11-25 Sing-Yuan Yeh , Chun-Hao Yang