机器学习 — Scifaro

Offline Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an offline setting with a fixed dataset, collected using a so-called logging policy. We develop…

机器学习 · 统计学 2026-03-17 Imon Banerjee , Harsha Honnappa , Vinayak Rao

VecMol: Vector-Field Representations for 3D Molecule Generation

Generative modeling of three-dimensional (3D) molecules is a fundamental yet challenging problem in drug discovery and materials science. Existing approaches typically represent molecules as 3D graphs and co-generate discrete atom types…

机器学习 · 统计学 2026-03-16 Yuchen Hua , Xingang Peng , Jianzhu Ma , Muhan Zhang

Batched Kernelized Bandits: Refinements and Extensions

In this paper, we consider the problem of black-box optimization with noisy feedback revealed in batches, where the unknown function to optimize has a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We refer to this as the…

机器学习 · 统计学 2026-03-16 Chenkai Ma , Keqin Chen , Jonathan Scarlett

Variational Garrote for Sparse Inverse Problems

Sparse regularization plays a central role in solving inverse problems arising from incomplete or corrupted measurements. Different regularizers correspond to different prior assumptions about the structure of the unknown signal, and…

机器学习 · 统计学 2026-03-16 Kanghun Lee , Hyungjoon Soh , Junghyo Jo

EB-RANSAC: Random Sample Consensus based on Energy-Based Model

Random sample consensus (RANSAC), which is based on a repetitive sampling from a given dataset, is one of the most popular robust estimation methods. In this study, an energy-based model (EBM) for robust estimation that has a similar scheme…

机器学习 · 统计学 2026-03-16 Muneki Yasuda , Nao Watanabe , Kaiji Sekimoto

Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration

Collecting multiple types of data on the same set of subjects is common in modern scientific applications including, genomics, metabolomics, and neuroimaging. Joint and Individual Variance Explained (JIVE) seeks a low-rank approximation of…

机器学习 · 统计学 2026-03-16 Raphiel J. Murden , Ganzhong Tian , Deqiang Qiu , Benajmin B. Risk

An Enhanced Projection Pursuit Tree Classifier with Visual Methods for Assessing Algorithmic Improvements

This paper presents enhancements to the projection pursuit tree classifier and visual diagnostic methods for assessing their impact in high dimensions. The original algorithm uses linear combinations of variables in a tree structure where…

机器学习 · 统计学 2026-03-16 Natalia da Silva , Dianne Cook , Eun-Kyung Lee

Precise Dynamics of Diagonal Linear Networks: A Unifying Analysis by Dynamical Mean-Field Theory

Diagonal linear networks (DLNs) are a tractable model that captures several nontrivial behaviors in neural network training, such as initialization-dependent solutions and incremental learning. These phenomena are typically studied in…

机器学习 · 统计学 2026-03-16 Sota Nishiyama , Masaaki Imaizumi

Minimax learning rates for estimating binary classifiers under margin conditions

We study classification problems using binary estimators where the decision boundary is described by horizon functions and where the data distribution satisfies a geometric margin condition. A key novelty of our work is the derivation of…

机器学习 · 统计学 2026-03-16 Jonathan García , Philipp Petersen

Nested Deep Learning Model Towards A Foundation Model for Brain Signal Data

Epilepsy affects around 50 million people globally. Electroencephalography (EEG) or Magnetoencephalography (MEG) based spike detection plays a crucial role in diagnosis and treatment. Manual spike identification is time-consuming and…

机器学习 · 统计学 2026-03-16 Fangyi Wei , Jiajie Mo , Kai Zhang , Haipeng Shen , Srikantan Nagarajan , Fei Jiang

Tight Non-asymptotic Inference via Sub-Gaussian Intrinsic Moment Norm

In non-asymptotic learning, variance-type parameters of sub-Gaussian distributions are of paramount importance. However, directly estimating these parameters using the empirical moment generating function (MGF) is infeasible. To address…

机器学习 · 统计学 2026-03-16 Huiming Zhang , Haoyu Wei , Guang Cheng

Wasserstein Gradient Flows for Batch Bayesian Optimal Experimental Design

Bayesian optimal experimental design (BOED) provides a powerful, decision-theoretic framework for selecting experiments so as to maximise the expected utility of the data to be collected. In practice, however, its applicability can be…

机器学习 · 统计学 2026-03-13 Louis Sharrock

Uncovering Locally Low-dimensional Structure in Networks by Locally Optimal Spectral Embedding

Standard Adjacency Spectral Embedding (ASE) relies on a global low-rank assumption often incompatible with the sparse, transitive structure of real-world networks, causing local geometric features to be 'smeared'. To address this, we…

机器学习 · 统计学 2026-03-13 Hannah Sansford , Nick Whiteley , Patrick Rubin-Delanchy

Hypercomplex Widely Linear Processing: Fundamentals for Quaternion Machine Learning

Numerous attempts have been made to replicate the success of complex-valued algebra in engineering and science to other hypercomplex domains such as quaternions, tessarines, biquaternions, and octonions. Perhaps, none have matched the…

机器学习 · 统计学 2026-03-13 Sayed Pouria Talebi , Clive Cheong Took

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Many machine learning tasks admit multiple models that perform almost equally well, a phenomenon known as predictive multiplicity. A fundamental source of this multiplicity is observational multiplicity, which arises from the stochastic…

机器学习 · 统计学 2026-03-13 Mustafa Cavus

Spatially Robust Inference with Predicted and Missing at Random Labels

When outcome data are expensive or onerous to collect, scientists increasingly substitute predictions from machine learning and AI models for unlabeled cases, a process which has consequences for downstream statistical inference. While…

机器学习 · 统计学 2026-03-13 Stephen Salerno , Zhenke Wu , Tyler McCormick

Worst-case low-rank approximations

Real-world data in health, economics, and environmental sciences are often collected across heterogeneous domains (such as hospitals, regions, or time periods). In such settings, distributional shifts can make standard PCA unreliable, in…

机器学习 · 统计学 2026-03-13 Anya Fries , Markus Reichstein , David Blei , Jonas Peters

A Unified Latent Space Disentanglement VAE Framework with Robust Disentanglement Effectiveness Evaluation

Evaluating and interpreting latent representations, such as variational autoencoders (VAEs), remains a significant challenge for diverse data types, especially when ground-truth generative factors are unknown. To address this, we propose a…

机器学习 · 统计学 2026-03-13 Xiaoan Lang , Fang Liu

Trustworthy predictive distributions for rare events via diagnostic transport maps

Forecast systems in science and technology are increasingly moving beyond point prediction toward methods that produce full predictive distributions of future outcomes y, conditional on high-dimensional and complex sequences of inputs x.…

机器学习 · 统计学 2026-03-13 Elizabeth Cucuzzella , Rafael Izbicki , Ann B. Lee

Deep regression learning from dependent observations with minimum error entropy principle

This paper considers nonparametric regression from strongly mixing observations. The proposed approach is based on deep neural networks with minimum error entropy (MEE) principle. We study two estimators: the non-penalized deep neural…

机器学习 · 统计学 2026-03-13 William Kengne , Modou Wade