机器学习 — Scifaro

Error Bounds and Optimal Schedules for Masked Diffusions with Factorized Approximations

Recently proposed generative models for discrete data, such as Masked Diffusion Models (MDMs), exploit conditional independence approximations to reduce the computational cost of popular Auto-Regressive Models (ARMs), at the price of some…

机器学习 · 统计学 2025-12-18 Hugo Lavenant , Giacomo Zanella

Identifiable Autoregressive Variational Autoencoders for Nonlinear and Nonstationary Spatio-Temporal Blind Source Separation

The modeling and prediction of multivariate spatio-temporal data involve numerous challenges. Dimension reduction methods can significantly simplify this process, provided that they account for the complex dependencies between variables and…

机器学习 · 统计学 2025-12-18 Mika Sipilä , Klaus Nordhausen , Sara Taskinen

Conformalized Decision Risk Assessment

In many operational settings, decision-makers must commit to actions before uncertainty resolves, but existing optimization tools rarely quantify how consistently a chosen decision remains optimal across plausible scenarios. This paper…

机器学习 · 统计学 2025-12-18 Wenbin Zhou , Agni Orfanoudaki , Shixiang Zhu

All Models Are Miscalibrated, But Some Less So: Comparing Calibration with Conditional Mean Operators

When working in a high-risk setting, having well calibrated probabilistic predictive models is a crucial requirement. However, estimators for calibration error are not always able to correctly distinguish which model is better calibrated.…

机器学习 · 统计学 2025-12-18 Peter Moskvichev , Dino Sejdinovic

Natural Variational Annealing for Multimodal Optimization

We introduce a new multimodal optimization approach called Natural Variational Annealing (NVA) that combines the strengths of three foundational concepts to simultaneously search for multiple global and local modes of black-box nonconvex…

机器学习 · 统计学 2025-12-18 Tâm LeMinh , Julyan Arbel , Thomas Möllenhoff , Mohammad Emtiyaz Khan , Florence Forbes

Robust Tensor Principal Component Analysis: Exact Recovery via Deterministic Model

Tensor, also known as multi-dimensional array, arises from many applications in signal processing, manufacturing processes, healthcare, among others. As one of the most popular methods in tensor literature, Robust tensor principal component…

机器学习 · 统计学 2025-12-18 Bo Shen , Yutong Zhang , Zhenyu , Kong

LLmFPCA-detect: LLM-powered Multivariate Functional PCA for Anomaly Detection in Sparse Longitudinal Texts

Sparse longitudinal (SL) textual data arises when individuals generate text repeatedly over time (e.g., customer reviews, occasional social media posts, electronic medical records across visits), but the frequency and timing of observations…

机器学习 · 统计学 2025-12-17 Prasanjit Dubey , Aritra Guha , Zhengyi Zhou , Qiong Wu , Xiaoming Huo , Paromita Dubey

From STLS to Projection-based Dictionary Selection in Sparse Regression for System Identification

In this work, we revisit dictionary-based sparse regression, in particular, Sequential Threshold Least Squares (STLS), and propose a score-guided library selection to provide practical guidance for data-driven modeling, with emphasis on…

机器学习 · 统计学 2025-12-17 Hangjun Cho , Fabio V. G. Amaral , Andrei A. Klishin , Cassio M. Oishi , Steven L. Brunton

Continual Learning at the Edge: An Agnostic IIoT Architecture

The exponential growth of Internet-connected devices has presented challenges to traditional centralized computing systems due to latency and bandwidth limitations. Edge computing has evolved to address these difficulties by bringing…

机器学习 · 统计学 2025-12-17 Pablo García-Santaclara , Bruno Fernández-Castro , Rebeca P. Díaz-Redondo , Carlos Calvo-Moa , Henar Mariño-Bodelón

Weighted Conformal Prediction Provides Adaptive and Valid Mask-Conditional Coverage for General Missing Data Mechanisms

Conformal prediction (CP) offers a principled framework for uncertainty quantification, but it fails to guarantee coverage when faced with missing covariates. In addressing the heterogeneity induced by various missing patterns,…

机器学习 · 统计学 2025-12-17 Jiarong Fan , Juhyun Park. Thi Phuong Thuy Vo , Nicolas Brunel

On the Hardness of Conditional Independence Testing In Practice

Tests of conditional independence (CI) underpin a number of important problems in machine learning and statistics, from causal discovery to evaluation of predictor fairness and out-of-distribution robustness. Shah and Peters (2020) showed…

机器学习 · 统计学 2025-12-17 Zheng He , Roman Pogodin , Yazhe Li , Namrata Deka , Arthur Gretton , Danica J. Sutherland

Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics

Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require…

机器学习 · 统计学 2025-12-17 Aaron Wei , Milad Jalali , Danica J. Sutherland

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments…

机器学习 · 统计学 2025-12-17 Theodore Jerome Tinker , Kenji Doya , Jun Tani

Misspecification-robust amortised simulation-based inference using variational methods

Recent advances in neural density estimation have enabled powerful simulation-based inference (SBI) methods that can flexibly approximate Bayesian inference for intractable stochastic models. Although these methods have demonstrated…

机器学习 · 统计学 2025-12-17 Matthew O'Callaghan , Kaisey S. Mandel , Gerry Gilmore

Near-Optimal Algorithms for Omniprediction

Omnipredictors are simple prediction functions that encode loss-minimizing predictions with respect to a hypothesis class $H$, simultaneously for every loss function within a class of losses $L$. In this work, we give near-optimal learning…

机器学习 · 统计学 2025-12-17 Princewill Okoroafor , Robert Kleinberg , Michael P. Kim

Automated Model Selection for Generalized Linear Models

In this paper, we show how mixed-integer conic optimization can be used to combine feature subset selection with holistic generalized linear models to fully automate the model selection process. Concretely, we directly optimize for the…

机器学习 · 统计学 2025-12-17 Benjamin Schwendinger , Florian Schwendinger , Laura Vana-Gür

General Formulation and PCL-Analysis for Restless Bandits with Limited Observability

In this paper, we consider a general observation model for restless multi-armed bandit problems. The operation of the player is based on the past observation history that is limited (partial) and error-prone due to resource constraints or…

机器学习 · 统计学 2025-12-17 Keqin Liu , Qizhen Jia

Holistic Generalized Linear Models

Holistic linear regression extends the classical best subset selection problem by adding additional constraints designed to improve the model quality. These constraints include sparsity-inducing constraints, sign-coherence constraints and…

机器学习 · 统计学 2025-12-17 Benjamin Schwendinger , Florian Schwendinger , Laura Vana

A Nonparametric Statistics Approach to Feature Selection in Deep Neural Networks with Theoretical Guarantees

This paper tackles the problem of feature selection in a highly challenging setting: $\mathbb{E}(y | \boldsymbol{x}) = G(\boldsymbol{x}_{\mathcal{S}_0})$, where $\mathcal{S}_0$ is the set of relevant features and $G$ is an unknown,…

机器学习 · 统计学 2025-12-16 Junye Du , Zhenghao Li , Zhutong Gu , Long Feng

General OOD Detection via Model-aware and Subspace-aware Variable Priority

Out-of-distribution (OOD) detection is essential for determining when a supervised model encounters inputs that differ meaningfully from its training distribution. While widely studied in classification, OOD detection for regression and…

机器学习 · 统计学 2025-12-16 Min Lu , Hemant Ishwaran