机器学习 — Scifaro

Beyond Uncertainty Sets: Leveraging Optimal Transport to Extend Conformal Predictive Distribution to Multivariate Settings

Conformal prediction (CP) constructs uncertainty sets for model outputs with finite-sample coverage guarantees. A candidate output is included in the prediction set if its non-conformity score is not considered extreme relative to the…

机器学习 · 统计学 2025-11-20 Eugene Ndiaye

Latent space analysis and generalization to out-of-distribution data

Understanding the relationships between data points in the latent decision space derived by the deep learning system is critical to evaluating and interpreting the performance of the system on real world data. Detecting…

机器学习 · 统计学 2025-11-20 Katie Rainey , Erin Hausmann , Donald Waagen , David Gray , Donald Hulsey

Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k-means and its wide family of variants are still widely used, all…

机器学习 · 统计学 2025-11-20 Sourav De , Koustav Chowdhury , Bibhabasu Mandal , Sagar Ghosh , Swagatam Das , Debolina Paul , Saptarshi Chakraborty

The Effect of Optimal Self-Distillation in Noisy Gaussian Mixture Model

Self-distillation (SD), a technique where a model improves itself using its own predictions, has attracted attention as a simple yet powerful approach in machine learning. Despite its widespread use, the mechanisms underlying its…

机器学习 · 统计学 2025-11-20 Kaito Takanami , Takashi Takahashi , Ayaka Sakata

LiLaN: A Linear Latent Network as the Solution Operator for Real-Time Solutions to Stiff Nonlinear Ordinary Differential Equations

Solving stiff ordinary differential equations (StODEs) requires sophisticated numerical solvers, which are often computationally expensive. In general, traditional explicit time integration schemes with restricted time step sizes are not…

机器学习 · 统计学 2025-11-20 William Cole Nockolds , C. G. Krishnanunni , Tan Bui-Thanh , Xianxhu Tang

Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization

We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV). This is achieved by adopting a lifted perspective through mean-field…

机器学习 · 统计学 2025-11-19 Zonghao Chen , Atsushi Nitanda , Arthur Gretton , Taiji Suzuki

DeepBlip: Estimating Conditional Average Treatment Effects Over Time

Structural nested mean models (SNMMs) are a principled approach to estimate the treatment effects over time. A particular strength of SNMMs is to break the joint effect of treatment sequences over time into localized, time-specific ``blip…

机器学习 · 统计学 2025-11-19 Haorui Ma , Dennis Frauen , Stefan Feuerriegel

Skewness-Robust Causal Discovery in Location-Scale Noise Models

To distinguish Markov equivalent graphs in causal discovery, it is necessary to restrict the structural causal model. Crucially, we need to be able to distinguish cause $X$ from effect $Y$ in bivariate models, that is, distinguish the two…

机器学习 · 统计学 2025-11-19 Daniel Klippert , Alexander Marx

Causal Discovery on Higher-Order Interactions

Causal discovery combines data with knowledge provided by experts to learn the DAG representing the causal relationships between a given set of variables. When data are scarce, bagging is used to measure our confidence in an average DAG…

机器学习 · 统计学 2025-11-19 Alessio Zanga , Marco Scutari , Fabio Stella

SCOPE: Spectral Concentration by Distributionally Robust Joint Covariance-Precision Estimation

We propose a distributionally robust formulation for simultaneously estimating the covariance matrix and the precision matrix of a random vector.The proposed model minimizes the worst-case weighted sum of the Frobenius loss of the…

机器学习 · 统计学 2025-11-19 Renjie Chen , Viet Anh Nguyen , Huifu Xu

Splat Regression Models

We introduce a highly expressive class of function approximators called Splat Regression Models. Model outputs are mixtures of heterogeneous and anisotropic bump functions, termed splats, each weighted by an output vector. The power of…

机器学习 · 统计学 2025-11-19 Mara Daniels , Philippe Rigollet

Empirical Likelihood for Random Forests and Ensembles

We develop an empirical likelihood (EL) framework for random forests and related ensemble methods, providing a likelihood-based approach to quantify their statistical uncertainty. Exploiting the incomplete $U$-statistic structure inherent…

机器学习 · 统计学 2025-11-19 Harold D. Chiang , Yukitoshi Matsushita , Taisuke Otsu

Uncertainty-Calibrated Prediction of Randomly-Timed Biomarker Trajectories with Conformal Bands

Despite recent progress in predicting biomarker trajectories from real clinical data, uncertainty in the predictions poses high-stakes risks (e.g., misdiagnosis) that limit their clinical deployment. To enable safe and reliable use of such…

机器学习 · 统计学 2025-11-19 Vasiliki Tassopoulou , Charis Stamouli , Haochang Shou , George J. Pappas , Christos Davatzikos

Knowledge vs. Experience: Asymptotic Limits of Impatience in Edge Tenants

We study how two information feeds, a closed-form Markov estimator of residual sojourn and an online trained actor-critic, affect reneging and jockeying in a dual M/M/1 system. Analytically, for unequal service rates and total-time…

机器学习 · 统计学 2025-11-19 Anthony Kiggundu , Bin Han , Hans D. Schotten

Continuum Dropout for Neural Differential Equations

Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in…

机器学习 · 统计学 2025-11-19 Jonghun Lee , YongKyung Oh , Sungil Kim , Dong-Young Lim

Sharp detection of low-dimensional structure in probability measures via dimensional logarithmic Sobolev inequalities

Identifying low-dimensional structure in high-dimensional probability measures is an essential pre-processing step for efficient sampling. We introduce a method for identifying and approximating a target measure $\pi$ as a perturbation of a…

机器学习 · 统计学 2025-11-19 Matthew T. C. Li , Tiangang Cui , Fengyi Li , Youssef Marzouk , Olivier Zahm

PyDTS: A Python Package for Discrete-Time Survival Analysis with Competing Risks and Optional Penalization

Time-to-event (survival) analysis models the time until a pre-specified event occurs. When time is measured in discrete units or rounded into intervals, standard continuous-time models can yield biased estimators. In addition, the event of…

机器学习 · 统计学 2025-11-19 Tomer Meir , Rom Gutman , Malka Gorfine

The Shape of Data: Topology Meets Analytics. A Practical Introduction to Topological Analytics and the Stability Index (TSI) in Business

Modern business and economic datasets often exhibit nonlinear, multi-scale structures that traditional linear tools under-represent. Topological Data Analysis (TDA) offers a geometric lens for uncovering robust patterns, such as connected…

机器学习 · 统计学 2025-11-18 Ioannis Diamantis

Likelihood-guided Regularization in Attention Based Models

The transformer architecture has demonstrated strong performance in classification tasks involving structured and high-dimensional data. However, its success often hinges on large- scale training data and careful regularization to prevent…

机器学习 · 统计学 2025-11-18 Mohamed Salem , Inyoung Kim

Reconstruction of Manifold Distances from Noisy Observations

We consider the problem of reconstructing the intrinsic geometry of a manifold from noisy pairwise distance observations. Specifically, let $M$ denote a diameter 1 d-dimensional manifold and $\mu$ a probability measure on $M$ that is…

机器学习 · 统计学 2025-11-18 Charles Fefferman , Jonathan Marty , Kevin Ren