机器学习 — Scifaro

Outlyingness Scores with Cluster Catch Digraphs

This paper introduces two novel, outlyingness scores (OSs) based on Cluster Catch Digraphs (CCDs): Outbound Outlyingness Score (OOS) and Inbound Outlyingness Score (IOS). These scores enhance the interpretability of outlier detection…

机器学习 · 统计学 2025-11-12 Rui Shi , Elvan Ceyhan , Nedret Billor

Automatic debiasing of neural networks via moment-constrained learning

Causal and nonparametric estimands in economics and biostatistics can often be viewed as the mean of a linear functional applied to an unknown outcome regression function. Naively learning the regression function and taking a sample mean of…

机器学习 · 统计学 2025-11-12 Christian L. Hines , Oliver J. Hines

Large deviations for interacting particle dynamics for finding mixed equilibria in zero-sum games

Finding equilibrium points in continuous minmax games has become a key problem within machine learning, in part due to its connection to the training of generative adversarial networks and reinforcement learning. Because of existence and…

机器学习 · 统计学 2025-11-12 Viktor Nilsson , Pierre Nyquist

Language Generation with Infinite Contamination

We study language generation in the limit, where an algorithm observes an adversarial enumeration of strings from an unknown target language $K$ and must eventually generate new, unseen strings from $K$. Kleinberg and Mullainathan [KM24]…

机器学习 · 统计学 2025-11-11 Anay Mehrotra , Grigoris Velegkas , Xifan Yu , Felix Zhou

Simulation-based Methods for Optimal Sampling Design in Systems Biology

In many areas of systems biology, including virology, pharmacokinetics, and population biology, dynamical systems are commonly used to describe biological processes. These systems can be characterized by estimating their parameters from…

机器学习 · 统计学 2025-11-11 Tuan Minh Ha , Binh Thanh Nguyen , Lam Si Tung Ho

Adaptive Testing for Segmenting Watermarked Texts From Language Models

The rapid adoption of large language models (LLMs), such as GPT-4 and Claude 3.5, underscores the need to distinguish LLM-generated text from human-written content to mitigate the spread of misinformation and misuse in education. One…

机器学习 · 统计学 2025-11-11 Xingchi Li , Xiaochi Liu , Guanxun Li

Bridging Theory and Practice: A Stochastic Learning-Optimization Model for Resilient Automotive Supply Chains

Supply chain disruptions and volatile demand pose significant challenges to the UK automotive industry, which relies heavily on Just-In-Time (JIT) manufacturing. While qualitative studies highlight the potential of integrating Artificial…

机器学习 · 统计学 2025-11-11 Muhammad Shahnawaz , Adeel Safder

Non-Negative Stiefel Approximating Flow: Orthogonalish Matrix Optimization for Interpretable Embeddings

Interpretable representation learning is a central challenge in modern machine learning, particularly in high-dimensional settings such as neuroimaging, genomics, and text analysis. Current methods often struggle to balance the competing…

机器学习 · 统计学 2025-11-11 Brian B. Avants , Nicholas J. Tustison , James R Stone

Fast Riemannian-manifold Hamiltonian Monte Carlo for hierarchical Gaussian-process models

Hierarchical Bayesian models based on Gaussian processes are considered useful for describing complex nonlinear statistical dependencies among variables in real-world data. However, effective Monte Carlo algorithms for inference with these…

机器学习 · 统计学 2025-11-11 Takashi Hayakawa , Satoshi Asai

Functional Adjoint Sampler: Scalable Sampling on Infinite Dimensional Spaces

Learning-based methods for sampling from the Gibbs distribution in finite-dimensional spaces have progressed quickly, yet theory and algorithmic design for infinite-dimensional function spaces remain limited. This gap persists despite their…

机器学习 · 统计学 2025-11-11 Byoungwoo Park , Juho Lee , Guan-Horng Liu

Sparsity via Hyperpriors: A Theoretical and Algorithmic Study under Empirical Bayes Framework

This paper presents a comprehensive analysis of hyperparameter estimation within the empirical Bayes framework (EBF) for sparse learning. By studying the influence of hyperpriors on the solution of EBF, we establish a theoretical connection…

机器学习 · 统计学 2025-11-11 Zhitao Li , Yiqiu Dong , Xueying Zeng

Preference-Based Dynamic Ranking Structure Recognition

Preference-based data often appear complex and noisy but may conceal underlying homogeneous structures. This paper introduces a novel framework of ranking structure recognition for preference-based data. We first develop an approach to…

机器学习 · 统计学 2025-11-11 Nan Lu , Jian Shi , Xin-Yu Tian

Stacking Variational Bayesian Monte Carlo

Approximate Bayesian inference for models with computationally expensive, black-box likelihoods poses a significant challenge, especially when the posterior distribution is complex. Many inference methods struggle to explore the parameter…

机器学习 · 统计学 2025-11-11 Francesco Silvestrin , Chengkun Li , Luigi Acerbi

Topology-Aware Conformal Prediction for Stream Networks

Stream networks, a unique class of spatiotemporal graphs, exhibit complex directional flow constraints and evolving dependencies, making uncertainty quantification a critical yet challenging task. Traditional conformal prediction methods…

机器学习 · 统计学 2025-11-11 Jifan Zhang , Fangxin Wang , Zihe Song , Philip S. Yu , Kaize Ding , Shixiang Zhu

Smooth Sailing: Lipschitz-Driven Uncertainty Quantification for Spatial Association

Estimating associations between spatial covariates and responses - rather than merely predicting responses - is central to environmental science, epidemiology, and economics. For instance, public health officials might be interested in…

机器学习 · 统计学 2025-11-11 David R. Burt , Renato Berlinghieri , Stephen Bates , Tamara Broderick

Uncertainty Quantification with the Empirical Neural Tangent Kernel

While neural networks have demonstrated impressive performance across various tasks, accurately quantifying uncertainty in their predictions is essential to ensure their trustworthiness and enable widespread adoption in critical systems.…

机器学习 · 统计学 2025-11-11 Joseph Wilson , Chris van der Heide , Liam Hodgkinson , Fred Roosta

Sparsifying Suprema of Gaussian Processes

We give a dimension-independent sparsification result for suprema of centered Gaussian processes: Let $T$ be any (possibly infinite) bounded set of vectors in $\mathbb{R}^n$, and let $\{\boldsymbol{X}_t := t \cdot \boldsymbol{g} \}_{t\in…

机器学习 · 统计学 2025-11-11 Anindya De , Shivam Nadimpalli , Ryan O'Donnell , Rocco A. Servedio

Contextual Linear Optimization with Partial Feedback

Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients in the objective and thereby improve decision-making performance. A canonical example is the stochastic shortest path…

机器学习 · 统计学 2025-11-11 Yichun Hu , Nathan Kallus , Xiaojie Mao , Yanchen Wu

Causal Dynamic Variational Autoencoder for Counterfactual Regression in Longitudinal Data

Accurately estimating treatment effects over time is crucial in fields such as precision medicine, epidemiology, economics, and marketing. Many current methods for estimating treatment effects over time assume that all confounders are…

机器学习 · 统计学 2025-11-11 Mouad El Bouchattaoui , Myriam Tami , Benoit Lepetit , Paul-Henry Cournède

Prototype Selection Using Topological Data Analysis

Recently, there has been an explosion in statistical learning literature to represent data using topological principles to capture structure and relationships. We propose a topological data analysis (TDA)-based framework, named Topological…

机器学习 · 统计学 2025-11-10 Jordan Eckert , Elvan Ceyhan , Henry Schenck