机器学习 — Scifaro

Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inference

This work develops an active learning framework to intelligently enrich data-driven reduced-order models (ROMs) of parametric dynamical systems, which can serve as the foundation of virtual assets in a digital twin. Data-driven ROMs are…

机器学习 · 统计学 2026-01-05 Shane A. McQuarrie , Mengwu Guo , Anirban Chaudhuri

Survey of Data-driven Newsvendor: Unified Analysis and Spectrum of Achievable Regrets

In the Newsvendor problem, the goal is to guess the number that will be drawn from some distribution, with asymmetric consequences for guessing too high vs. too low. In the data-driven version, the distribution is unknown, and one must work…

机器学习 · 统计学 2026-01-05 Zhuoxin Chen , Will Ma

Probabilistic Reduced-Dimensional Vector Autoregressive Modeling with Oblique Projections

In this paper, we propose a probabilistic reduced-dimensional vector autoregressive (PredVAR) model to extract low-dimensional dynamics from high-dimensional noisy data. The model utilizes an oblique projection to partition the measurement…

机器学习 · 统计学 2026-01-05 Yanfang Mo , S. Joe Qin

Sparse-Input Neural Network using Group Concave Regularization

Simultaneous feature selection and non-linear function estimation is challenging in modeling, especially in high-dimensional settings where the number of variables exceeds the available sample size. In this article, we investigate the…

机器学习 · 统计学 2026-01-05 Bin Luo , Susan Halabi

MCD: Marginal Contrastive Discrimination for conditional density estimation

We consider the problem of conditional density estimation, which is a major topic of interest in the fields of statistical and machine learning. Our method, called Marginal Contrastive Discrimination, MCD, reformulates the conditional…

机器学习 · 统计学 2026-01-05 Katia Meziani , Aminata Ndiaye , Benjamin Riu

Are First-Order Diffusion Samplers Really Slower? A Fast Forward-Value Approach

Higher-order ODE solvers have become a standard tool for accelerating diffusion probabilistic model (DPM) sampling, motivating the widespread view that first-order methods are inherently slower and that increasing discretization order is…

机器学习 · 统计学 2026-01-01 Yuchen Jiao , Na Li , Changxiao Cai , Gen Li

MultiRisk: Multiple Risk Control via Iterative Score Thresholding

As generative AI systems are increasingly deployed in real-world applications, regulating multiple dimensions of model behavior has become essential. We focus on test-time filtering: a lightweight mechanism for behavior control that…

机器学习 · 统计学 2026-01-01 Sunay Joshi , Yan Sun , Hamed Hassani , Edgar Dobriban

Improving the stability of the covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling

Stochastic gradient Langevin dynamics and its variants approximate the likelihood of an entire dataset, via random (and typically much smaller) subsets, in the setting of Bayesian sampling. Due to the (often substantial) improvement of the…

机器学习 · 统计学 2026-01-01 Jiani Wei , Xiaocheng Shang

Constructive Approximation of Random Process via Stochastic Interpolation Neural Network Operators

In this paper, we construct a class of stochastic interpolation neural network operators (SINNOs) with random coefficients activated by sigmoidal functions. We establish their boundedness, interpolation accuracy, and approximation…

机器学习 · 统计学 2026-01-01 Sachin Saini , Uaday Singh

Energy-Tweedie: Score meets Score, Energy meets Energy

Denoising and score estimation have long been known to be linked via the classical Tweedie's formula. In this work, we first extend the latter to a wider range of distributions often called "energy models" and denoted elliptical…

机器学习 · 统计学 2026-01-01 Andrej Leban

Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws

We study the optimization and sample complexity of gradient-based training of a two-layer neural network with quadratic activation function in the high-dimensional regime, where the data is generated as $f_*(\boldsymbol{x}) \propto…

机器学习 · 统计学 2026-01-01 Gérard Ben Arous , Murat A. Erdogdu , Nuri Mert Vural , Denny Wu

Robust Distributed Estimation: Extending Gossip Algorithms to Ranking and Trimmed Means

This paper addresses the problem of robust estimation in gossip algorithms over arbitrary communication graphs. Gossip algorithms are fully decentralized, relying only on local neighbor-to-neighbor communication, making them well-suited for…

机器学习 · 统计学 2026-01-01 Anna Van Elst , Igor Colin , Stephan Clémençon

Discovery and inference beyond linearity by integrating Bayesian regression, tree ensembles and Shapley values

Machine Learning (ML) is gaining popularity for hypothesis-free discovery of risk and protective factors in healthcare studies. ML is strong at discovering nonlinearities and interactions, but this power is compromised by a lack of reliable…

机器学习 · 统计学 2026-01-01 Giorgio Spadaccini , Marjolein Fokkema , Mark A. van de Wiel

Generative Machine Learning for Multivariate Angular Simulation

With the recent development of new geometric and angular-radial frameworks for multivariate extremes, reliably simulating from angular variables in moderate-to-high dimensions is of increasing importance. Empirical approaches have the…

机器学习 · 统计学 2026-01-01 Jakob Benjamin Wessel , Callum J. R. Murphy-Barltrop , Emma S. Simpson

Concentration Inequalities for Stochastic Optimization of Unbounded Objective Functions with Application to Denoising Score Matching

We derive novel concentration inequalities that bound the statistical error for a large class of stochastic optimization problems, focusing on the case of unbounded objective functions. Our derivations utilize the following key tools: 1) A…

机器学习 · 统计学 2026-01-01 Jeremiah Birrell

coverforest: Conformal Predictions with Random Forest in Python

Conformal prediction provides a framework for uncertainty quantification, specifically in the forms of prediction intervals and sets with distribution-free guaranteed coverage. While recent cross-conformal techniques such as CV+ and…

机器学习 · 统计学 2026-01-01 Panisara Meehinkong , Donlapark Ponnoprat

NeuroPMD: Neural Fields for Density Estimation on Product Manifolds

We propose a novel deep neural network methodology for density estimation on product Riemannian manifold domains. In our approach, the network directly parameterizes the unknown density function and is trained using a penalized maximum…

机器学习 · 统计学 2026-01-01 William Consagra , Zhiling Gu , Zhengwu Zhang

Tight PAC-Bayesian Risk Certificates for Contrastive Learning

Contrastive representation learning is a modern paradigm for learning representations of unlabeled data via augmentations -- precisely, contrastive models learn to embed semantically similar pairs of samples (positive pairs) closer than…

机器学习 · 统计学 2026-01-01 Anna Van Elst , Debarghya Ghoshdastidar

Symmetric Linear Bandits with Hidden Symmetry

High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be…

机器学习 · 统计学 2026-01-01 Nam Phuong Tran , The Anh Ta , Debmalya Mandal , Long Tran-Thanh

Stochastic Gradient Descent for Nonparametric Additive Regression

This paper introduces an iterative algorithm for training nonparametric additive models that enjoys favorable memory storage and computational requirements. The algorithm can be viewed as the functional counterpart of stochastic gradient…

机器学习 · 统计学 2026-01-01 Xin Chen , Jason M. Klusowski