机器学习 — Scifaro

On Consistency of Signature Using Lasso

Signatures are iterated path integrals of continuous and discrete-time processes, and their universal nonlinearity linearizes the problem of feature selection in time series data analysis. This paper studies the consistency of signature…

机器学习 · 统计学 2026-03-24 Xin Guo , Binnan Wang , Ruixun Zhang , Chaoyi Zhao

Noise-contrastive Online Change Point Detection

We suggest a novel procedure for online change point detection. Our approach expands an idea of maximizing a discrepancy measure between points from pre-change and post-change distributions. This leads to flexible algorithms suitable for…

机器学习 · 统计学 2026-03-24 Nikita Puchkin , Artur Goldman , Konstantin Yakovlev , Valeriia Dzis , Uliana Vinogradova

LOCO Feature Importance Inference without Data Splitting via Minipatch Ensembles

Feature importance inference is critical for the interpretability and reliability of machine learning models. There has been increasing interest in developing model-agnostic approaches to interpret any predictive model, often in the form of…

机器学习 · 统计学 2026-03-24 Luqin Gan , Lili Zheng , Genevera I. Allen

Estimate of the Neural Network Dimension using Algebraic Topology and Lie Theory

In this paper we present an approach to determine the smallest possible number of neurons in a layer of a neural network in such a way that the topology of the input space can be learned sufficiently well. We introduce a general procedure…

机器学习 · 统计学 2026-03-24 Luciano Melodia , Richard Lenz

Deep Autocorrelation Modeling for Time-Series Forecasting: Progress and Prospects

Autocorrelation is a defining characteristic of time-series data, where each observation is statistically dependent on its predecessors. In the context of deep time-series forecasting, autocorrelation arises in both the input history and…

机器学习 · 统计学 2026-03-23 Hao Wang , Licheng Pan , Qingsong Wen , Jialin Yu , Zhichao Chen , Chunyuan Zheng , Xiaoxi Li , Zhixuan Chu , Chao Xu , Mingming Gong , Haoxuan Li , Yuan Lu , Zhouchen Lin , Philip Torr , Yan Liu

Explainable cluster analysis: a bagging approach

A major limitation of clustering approaches is their lack of explainability: methods rarely provide insight into which features drive the grouping of similar observations. To address this limitation, we propose an ensemble-based clustering…

机器学习 · 统计学 2026-03-23 Federico Maria Quetti , Elena Ballante , Silvia Figini , Paolo Giudici

A two-step sequential approach for hyperparameter selection in finite context models

Finite-context models (FCMs) are widely used for compressing symbolic sequences such as DNA, where predictive performance depends critically on the context length k and smoothing parameter {\alpha}. In practice, these hyperparameters are…

机器学习 · 统计学 2026-03-23 José Contente , Ana Martins , Armando J. Pinho , Sónia Gouveia

Model Selection and Parameter Estimation of Multi-dimensional Gaussian Mixture Model

In this paper, we study the problem of learning multi-dimensional Gaussian Mixture Models (GMMs), with a specific focus on model order selection and efficient mixing distribution estimation. We first establish an information-theoretic lower…

机器学习 · 统计学 2026-03-23 Xinyu Liu , Hai Zhang

On the role of memorization in learned priors for geophysical inverse problems

Learned priors based on deep generative models offer data-driven regularization for seismic inversion, but training them requires a dataset of representative subsurface models -- a resource that is inherently scarce in geoscience…

机器学习 · 统计学 2026-03-23 Ali Siahkoohi , Davide Sabeddu

Near-Equivalent Q-learning Policies for Dynamic Treatment Regimes

Precision medicine aims to tailor therapeutic decisions to individual patient characteristics. This objective is commonly formalized through dynamic treatment regimes, which use statistical and machine learning methods to derive sequential…

机器学习 · 统计学 2026-03-23 Sophia Yazzourh , Erica E. M. Moodie

Subspace Projection Methods for Fast Spectral Embeddings of Evolving Graphs

Several graph data mining, signal processing, and machine learning downstream tasks rely on information related to the eigenvectors of the associated adjacency or Laplacian matrix. Classical eigendecomposition methods are powerful when the…

机器学习 · 统计学 2026-03-23 Mohammad Eini , Abdullah Karaaslanli , Vassilis Kalantzis , Panagiotis A. Traganitis

ResNets of All Shapes and Sizes: Convergence of Training Dynamics in the Large-scale Limit

We establish convergence of the training dynamics of residual neural networks (ResNets) to their joint infinite depth L, hidden width M, and embedding dimension D limit. Specifically, we consider ResNets with two-layer perceptron blocks in…

机器学习 · 统计学 2026-03-23 Louis-Pierre Chaintron , Lénaïc Chizat , Javier Maass

Learnability with Partial Labels and Adaptive Nearest Neighbors

Prior work on partial labels learning (PLL) has shown that learning is possible even when each instance is associated with a bag of labels, rather than a single accurate but costly label. However, the necessary conditions for learning with…

机器学习 · 统计学 2026-03-23 Nicolas A. Errandonea , Santiago Mazuelas , Jose A. Lozano , Sanjoy Dasgupta

Unsupervised Feature Selection via Robust Autoencoder and Adaptive Graph Learning

Effective feature selection is essential for high-dimensional data analysis and machine learning. Unsupervised feature selection (UFS) aims to simultaneously cluster data and identify the most discriminative features. Most existing UFS…

机器学习 · 统计学 2026-03-23 Feng Yu , MD Saifur Rahman Mazumder , Ying Su , Oscar Contreras Velasco

Learning Representations for Independence Testing

Many tools exist to detect dependence between random variables, a core question across a wide range of machine learning, statistical, and scientific endeavors. Although several statistical tests guarantee eventual detection of any…

机器学习 · 统计学 2026-03-23 Nathaniel Xu , Feng Liu , Danica J. Sutherland

A new paradigm for global sensitivity analysis

It is well-known that Sobol indices, which count among the most popular sensitivity indices, are based on the Sobol decomposition. Here we challenge this construction by redefining Sobol indices without the Sobol decomposition. In fact, we…

机器学习 · 统计学 2026-03-23 Gildas Mazo

The Exponentially Weighted Signature

The signature is a canonical representation of a multidimensional path over an interval. However, it treats all historical information uniformly, offering no intrinsic mechanism for contextualising the relevance of the past. To address…

机器学习 · 统计学 2026-03-20 Alexandre Bloch , Samuel N. Cohen , Terry Lyons , Joël Mouterde , Benjamin Walker

Fast and Interpretable Autoregressive Estimation with Neural Network Backpropagation

Autoregressive (AR) models remain widely used in time series analysis due to their interpretability, but convencional parameter estimation methods can be computationally expensive and prone to convergence issues. This paper proposes a…

机器学习 · 统计学 2026-03-20 Anaísa Lucena , Ana Martins , Armando J. Pinho , Sónia Gouveia

Revisiting OmniAnomaly for Anomaly Detection: performance metrics and comparison with PCA-based models

Deep learning models have become the dominant approach for multivariate time series anomaly detection (MTSAD), often reporting substantial performance improvements over classical statistical methods. However, these gains are frequently…

机器学习 · 统计学 2026-03-20 Bruna Alves , Ana Martins , Armando J. Pinho , Sónia Gouveia

Kernel Single-Index Bandits: Estimation, Inference, and Learning

We study contextual bandits with finitely many actions in which the reward of each arm follows a single-index model with an arm-specific index parameter and an unknown nonparametric link function. We consider a regime in which arms…

机器学习 · 统计学 2026-03-20 Sakshi Arya , Satarupa Bhattacharjee , Bharath K. Sriperumbudur