机器学习 — Scifaro

Discovering group dynamics in coordinated time series via hierarchical recurrent switching-state models

We seek a computationally efficient model for a collection of time series arising from multiple interacting entities (a.k.a. "agents"). Recent models of temporal patterns across individuals fail to incorporate explicit system-level…

机器学习 · 统计学 2025-08-06 Michael T. Wojnowicz , Kaitlin Gili , Preetish Rath , Eric Miller , Jeffrey Miller , Clifford Hancock , Meghan O'Donovan , Seth Elkin-Frankston , Tad T. Brunyé , Michael C. Hughes

Semi-analytic approximate stability selection for correlated data in generalized linear models

We consider the variable selection problem of generalized linear models (GLMs). Stability selection (SS) is a promising method proposed for solving this problem. Although SS provides practical variable selection criteria, it is…

机器学习 · 统计学 2025-08-06 Takashi Takahashi , Yoshiyuki Kabashima

Comparing Generative Models with the New Physics Learning Machine

The rise of generative models for scientific research calls for the development of new methods to evaluate their fidelity. A natural framework for addressing this problem is two-sample hypothesis testing, namely the task of determining…

机器学习 · 统计学 2025-08-05 Samuele Grossi , Marco Letizia , Riccardo Torre

Structure Maintained Representation Learning Neural Network for Causal Inference

Recent developments in causal inference have greatly shifted the interest from estimating the average treatment effect to the individual treatment effect. In this article, we improve the predictive accuracy of representation learning and…

机器学习 · 统计学 2025-08-05 Yang Sun , Wenbin Lu , Yi-Hui Zhou

Fast Gaussian process inference by exact Mat\'ern kernel decomposition

To speed up Gaussian process inference, a number of fast kernel matrix-vector multiplication (MVM) approximation algorithms have been proposed over the years. In this paper, we establish an exact fast kernel MVM algorithm based on exact…

机器学习 · 统计学 2025-08-05 Nicolas Langrené , Xavier Warin , Pierre Gruet

Uncertainty Quantification for Large-Scale Deep Networks via Post-StoNet Modeling

Deep learning has revolutionized modern data science. However, how to accurately quantify the uncertainty of predictions from large-scale deep neural networks (DNNs) remains an unresolved issue. To address this issue, we introduce a novel…

机器学习 · 统计学 2025-08-05 Yan Sun , Faming Liang

Inequalities for Optimization of Classification Algorithms: A Perspective Motivated by Diagnostic Testing

Motivated by canonical problems in medical diagnostics, we propose and study properties of an objective function that uniformly bounds uncertainties in quantities of interest extracted from classifiers and related data analysis tools. We…

机器学习 · 统计学 2025-08-05 Paul N. Patrone , Anthony J. Kearsley

Resolving Memorization in Empirical Diffusion Model for Manifold Data in High-Dimensional Spaces

Diffusion models are popular tools for generating new data samples, using a forward process that adds noise to data and a reverse process to denoise and produce samples. However, when the data distribution consists of n points, empirical…

机器学习 · 统计学 2025-08-05 Yang Lyu , Tan Minh Nguyen , Yuchun Qian , Xin T. Tong

Learning to Fuse Temporal Proximity Networks: A Case Study in Chimpanzee Social Interactions

How can we identify groups of primate individuals which could be conjectured to drive social structure? To address this question, one of us has collected a time series of data for social interactions between chimpanzees. Here we use a…

机器学习 · 统计学 2025-08-05 Yixuan He , Aaron Sandel , David Wipf , Mihai Cucuringu , John Mitani , Gesine Reinert

Ensuring superior learning outcomes and data security for authorized learner

The learner's ability to generate a hypothesis that closely approximates the target function is crucial in machine learning. Achieving this requires sufficient data; however, unauthorized access by an eavesdropping learner can lead to…

机器学习 · 统计学 2025-08-05 Jeongho Bang , Wooyeong Song , Kyujin Shin , Yong-Su Kim

From Point to probabilistic gradient boosting for claim frequency and severity prediction

Gradient boosting for decision tree algorithms are increasingly used in actuarial applications as they show superior predictive performance over traditional generalised linear models. Many enhancements to the first gradient boosting machine…

机器学习 · 统计学 2025-08-05 Dominik Chevalier , Marie-Pier Côté

Learning large softmax mixtures with warm start EM

Softmax mixture models (SMMs) are discrete $K$-mixtures introduced to model the probability of choosing an attribute $x_j \in \RR^L$ from $p$ candidates, in heterogeneous populations. They have been known as mixed multinomial logits in the…

机器学习 · 统计学 2025-08-05 Xin Bing , Florentina Bunea , Jonathan Niles-Weed , Marten Wegkamp

A Confidence Interval for the $\ell_2$ Expected Calibration Error

Recent advances in machine learning have significantly improved prediction accuracy in various applications. However, ensuring the calibration of probabilistic predictions remains a significant challenge. Despite efforts to enhance model…

机器学习 · 统计学 2025-08-05 Yan Sun , Pratik Chaudhari , Ian J. Barnett , Edgar Dobriban

Enhancing OOD Detection Using Latent Diffusion

Out-of-distribution (OOD) detection is crucial for the reliable deployment of machine learning models in real-world scenarios, enabling the identification of unknown samples or objects. A prominent approach to enhance OOD detection…

机器学习 · 统计学 2025-08-05 Heng Gao , Jun Li

Comparison of Affine and Rational Quadratic Spline Coupling and Autoregressive Flows through Robust Statistical Tests

Normalizing flows have emerged as a powerful brand of generative models, as they not only allow for efficient sampling of complicated target distributions but also deliver density estimation by construction. We propose here an in-depth…

机器学习 · 统计学 2025-08-05 Andrea Coccaro , Marco Letizia , Humberto Reyes-Gonzalez , Riccardo Torre

Sinusoidal Approximation Theorem for Kolmogorov-Arnold Networks

The Kolmogorov-Arnold representation theorem states that any continuous multivariable function can be exactly represented as a finite superposition of continuous single variable functions. Subsequent simplifications of this representation…

机器学习 · 统计学 2025-08-04 Sergei Gleyzer , Hanh Nguyen , Dinesh P. Ramakrishnan , Eric A. F. Reinhardt

Batched Nonparametric Bandits via k-Nearest Neighbor UCB

We study sequential decision-making in batched nonparametric contextual bandits, where actions are selected over a finite horizon divided into a small number of batches. Motivated by constraints in domains such as medicine and marketing --…

机器学习 · 统计学 2025-08-04 Sakshi Arya

An EM Gradient Algorithm for Mixture Models with Components Derived from the Manly Transformation

Zhu and Melnykov (2018) develop a model to fit mixture models when the components are derived from the Manly transformation. Their EM algorithm utilizes Nelder-Mead optimization in the M-step to update the skew parameter,…

机器学习 · 统计学 2025-08-04 Katharine M. Clark , Paul D. McNicholas

Pure interaction effects unseen by Random Forests

Random Forests are widely claimed to capture interactions well. However, some simple examples suggest that they perform poorly in the presence of certain pure interactions that the conventional CART criterion struggles to capture during…

机器学习 · 统计学 2025-08-04 Ricardo Blum , Munir Hiabu , Enno Mammen , Joseph Theo Meyer

Bagged Regularized $k$-Distances for Anomaly Detection

We consider the paradigm of unsupervised anomaly detection, which involves the identification of anomalies within a dataset in the absence of labeled examples. Though distance-based methods are top-performing for unsupervised anomaly…

机器学习 · 统计学 2025-08-04 Yuchao Cai , Hanfang Yang , Yuheng Ma , Hanyuan Hang