机器学习 — Scifaro

The Performance Of The Unadjusted Langevin Algorithm Without Smoothness Assumptions

In this article, we study the problem of sampling from distributions whose densities are not necessarily smooth nor logconcave. We propose a simple Langevin-based algorithm that does not rely on popular but computationally challenging…

机器学习 · 统计学 2025-12-02 Tim Johnston , Iosif Lytras , Nikolaos Makras , Sotirios Sabanis

Value-oriented forecast reconciliation for renewables in electricity markets

Forecast reconciliation is considered an effective method to achieve coherence (within a forecast hierarchy) and to improve forecast quality. However, the value of reconciled forecasts in downstream decision-making tasks has been mostly…

机器学习 · 统计学 2025-12-02 Honglin Wen , Pierre Pinson

Heterogeneous transfer learning for high-dimensional regression with feature mismatch

We consider Heterogeneous Transfer Learning (HTL) from a source to a new target domain for high-dimensional regression with differing feature sets. Most homogeneous TL methods assume that target and source domains share the same feature…

机器学习 · 统计学 2025-12-02 Jae Ho Chang , Massimiliano Russo , Subhadeep Paul

Stabilizing black-box model selection with the inflated argmax

Model selection is the process of choosing from a class of candidate models given data. For instance, methods such as the LASSO and sparse identification of nonlinear dynamics (SINDy) formulate model selection as finding a sparse solution…

机器学习 · 统计学 2025-12-02 Melissa Adrian , Jake A. Soloff , Rebecca Willett

CAP: A General Algorithm for Online Selective Conformal Prediction with FCR Control

We study the problem of post-selection predictive inference in an online fashion. To avoid devoting resources to unimportant units, a preliminary selection of the current individual before reporting its prediction interval is common and…

机器学习 · 统计学 2025-12-02 Yajie Bao , Yuyang Huo , Haojie Ren , Changliang Zou

Stochastic Hessian Fittings with Lie Groups

This report investigates the fitting of the Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion derived from the preconditioned stochastic gradient descent (PSGD) method. This criterion is closely related…

机器学习 · 统计学 2025-12-02 Xi-Lin Li

Asymptotic Theory and Phase Transitions for Variable Importance in Quantile Regression Forests

Quantile Regression Forests (QRF) are widely used for non-parametric conditional quantile estimation, yet statistical inference for variable importance measures remains challenging due to the non-smoothness of the loss function and the…

机器学习 · 统计学 2025-12-01 Tomoshige Nakamura , Hiroshi Shiraishi

A PLS-Integrated LASSO Method with Application in Index Tracking

In traditional multivariate data analysis, dimension reduction and regression have been treated as distinct endeavors. Established techniques such as principal component regression (PCR) and partial least squares (PLS) regression…

机器学习 · 统计学 2025-12-01 Shiqin Tang , Yining Dong , S. Joe Qin

Data-driven informative priors for Bayesian inference with quasi-periodic data

Bayesian computational strategies for inference can be inefficient in approximating the posterior distribution in models that exhibit some form of periodicity. This is because the probability mass of the marginal posterior distribution of…

机器学习 · 统计学 2025-12-01 Javier Lopez-Santiago , Luca Martino , Joaquin Miguez , Gonzalo Vazquez-Vilar

UCB for Large-Scale Pure Exploration: Beyond Sub-Gaussianity

Selecting the best alternative from a finite set represents a broad class of pure exploration problems. Traditional approaches to pure exploration have predominantly relied on Gaussian or sub-Gaussian assumptions on the performance…

机器学习 · 统计学 2025-12-01 Zaile Li , Weiwei Fan , L. Jeff Hong

Support Vector Machine Classifier with Rescaled Huberized Pinball Loss

Support vector machines are widely used in machine learning classification tasks, but traditional SVM models suffer from sensitivity to outliers and instability in resampling, which limits their performance in practical applications. To…

机器学习 · 统计学 2025-12-01 Shibo Diao

On the Effect of Regularization on Nonparametric Mean-Variance Regression

Uncertainty quantification is vital for decision-making and risk assessment in machine learning. Mean-variance regression models, which predict both a mean and residual noise for each data point, provide a simple approach to uncertainty…

机器学习 · 统计学 2025-12-01 Eliot Wong-Toi , Alex Boyd , Vincent Fortuin , Stephan Mandt

Algorithms and Scientific Software for Quasi-Monte Carlo, Fast Gaussian Process Regression, and Scientific Machine Learning

Most scientific domains elicit the development of efficient algorithms and accessible scientific software. This thesis unifies our developments in three broad domains: Quasi-Monte Carlo (QMC) methods for efficient high-dimensional…

机器学习 · 统计学 2025-12-01 Aleksei G. Sorokin

Spatio-Temporal Hierarchical Causal Models

The abundance of fine-grained spatio-temporal data, such as traffic sensor networks, offers vast opportunities for scientific discovery. However, inferring causal relationships from such observational data remains challenging, particularly…

机器学习 · 统计学 2025-12-01 Xintong Li , Haoran Zhang , Xiao Zhou

Property Elicitation on Imprecise Probabilities

Property elicitation studies which attributes of a probability distribution can be determined by minimizing a risk. We investigate a generalization of property elicitation to imprecise probabilities (IP). This investigation is motivated by…

机器学习 · 统计学 2025-12-01 James Bailie , Rabanus Derr

Quantifying Statistical Significance of Deep Nearest Neighbor Anomaly Detection via Selective Inference

In real-world applications, anomaly detection (AD) often operates without access to anomalous data, necessitating semi-supervised methods that rely solely on normal data. Among these methods, deep k-nearest neighbor (deep kNN) AD stands out…

机器学习 · 统计学 2025-12-01 Mizuki Niihori , Shuichi Nishino , Teruyuki Katsuoka , Tomohiro Shiraishi , Kouichi Taji , Ichiro Takeuchi

Split Conformal Prediction under Data Contamination

Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is popular as it comes with theoretical guarantees on…

机器学习 · 统计学 2025-12-01 Jase Clarkson , Wenkai Xu , Mihai Cucuringu , Yvik Swan , Gesine Reinert

Gaussian Universality in Neural Network Dynamics with Generalized Structured Input Distributions

Analyzing neural network dynamics via stochastic gradient descent (SGD) is crucial to building theoretical foundations for deep learning. Previous work has analyzed structured inputs within the \textit{hidden manifold model}, often under…

机器学习 · 统计学 2025-12-01 Jaeyong Bae , Hawoong Jeong

On Evolution-Based Models for Experimentation Under Interference

Causal effect estimation in networked systems is central to data-driven decision making. In such settings, interventions on one unit can spill over to others, and in complex physical or social systems, the interaction pathways driving these…

机器学习 · 统计学 2025-11-27 Sadegh Shirani , Mohsen Bayati

Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities (II)

A fundamental theoretical question in network analysis is to determine under which conditions community recovery is possible in polynomial time in the Stochastic Block Model (SBM). When the number $K$ of communities remains smaller than…

机器学习 · 统计学 2025-11-27 Alexandra Carpentier , Christophe Giraud , Nicolas Verzelen