机器学习 — Scifaro

Machine Learning for Complex Systems with Abnormal Pattern by Exception Maximization Outlier Detection Method

This paper proposes a novel fast online methodology for outlier detection called the exception maximization outlier detection method(EMODM), which employs probabilistic models and statistical algorithms to detect abnormal patterns from the…

机器学习 · 统计学 2025-06-03 Zhikun Zhang , Yiting Duan , Xiangjun Wang , Mingyuan Zhang

Enhancing Accuracy in Generative Models via Knowledge Transfer

This paper investigates the accuracy of generative models and the impact of knowledge transfer on their generation precision. Specifically, we examine a generative model for a target task, fine-tuned using a pre-trained model from a source…

机器学习 · 统计学 2025-06-03 Xinyu Tian , Xiaotong Shen

Generalization Bounds for Heavy-Tailed SDEs through the Fractional Fokker-Planck Equation

Understanding the generalization properties of heavy-tailed stochastic optimization algorithms has attracted increasing attention over the past years. While illuminating interesting aspects of stochastic optimizers by using heavy-tailed…

机器学习 · 统计学 2025-06-03 Benjamin Dupuis , Umut Şimşekli

Resampled Confidence Regions with Exponential Shrinkage for the Regression Function of Binary Classification

The regression function is one of the key objects of binary classification, since it not only determines a Bayes optimal classifier, hence, defines an optimal decision boundary, but also encodes the conditional distribution of the output…

机器学习 · 统计学 2025-06-03 Ambrus Tamás , Balázs Csanád Csáji

Kernel $\epsilon$-Greedy for Multi-Armed Bandits with Covariates

We consider the $\epsilon$-greedy strategy for the multi-arm bandit with covariates (MABC) problem, where the mean reward functions are assumed to lie in a reproducing kernel Hilbert space (RKHS). We propose to estimate the unknown mean…

机器学习 · 统计学 2025-06-03 Sakshi Arya , Bharath K. Sriperumbudur

MixFlows: principled variational inference via mixed flows

This work presents mixed variational flows (MixFlows), a new variational family that consists of a mixture of repeated applications of a map to an initial reference distribution. First, we provide efficient algorithms for i.i.d. sampling,…

机器学习 · 统计学 2025-06-03 Zuheng Xu , Naitong Chen , Trevor Campbell

Statistical mechanics of extensive-width Bayesian neural networks near interpolation

For three decades statistical mechanics has been providing a framework to analyse neural networks. However, the theoretically tractable models, e.g., perceptrons, random features models and kernel machines, or multi-index models and…

机器学习 · 统计学 2025-06-02 Jean Barbier , Francesco Camilli , Minh-Toan Nguyen , Mauro Pastore , Rudy Skerk

Efficient Estimation of Regularized Tyler's M-Estimator Using Approximate LOOCV

We consider the problem of estimating a regularization parameter, or a shrinkage coefficient $\alpha \in (0,1)$ for Regularized Tyler's M-estimator (RTME). In particular, we propose to estimate an optimal shrinkage coefficient by setting…

机器学习 · 统计学 2025-06-02 Karim Abou-Moustafa

Knockoff-Guided Compressive Sensing: A Statistical Machine Learning Framework for Support-Assured Signal Recovery

This paper introduces a novel Knockoff-guided compressive sensing framework, referred to as \TheName{}, which enhances signal recovery by leveraging precise false discovery rate (FDR) control during the support identification phase. Unlike…

机器学习 · 统计学 2025-06-02 Xiaochen Zhang , Haoyi Xiong

K$^2$IE: Kernel Method-based Kernel Intensity Estimators for Inhomogeneous Poisson Processes

Kernel method-based intensity estimators, formulated within reproducing kernel Hilbert spaces (RKHSs), and classical kernel intensity estimators (KIEs) have been among the most easy-to-implement and feasible methods for estimating the…

机器学习 · 统计学 2025-06-02 Hideaki Kim , Tomoharu Iwata , Akinori Fujino

Impact of Bottleneck Layers and Skip Connections on the Generalization of Linear Denoising Autoencoders

Modern deep neural networks exhibit strong generalization even in highly overparameterized regimes. Significant progress has been made to understand this phenomenon in the context of supervised learning, but for unsupervised tasks such as…

机器学习 · 统计学 2025-06-02 Jonghyun Ham , Maximilian Fleissner , Debarghya Ghoshdastidar

Predictive posterior sampling from non-stationnary Gaussian process priors via Diffusion models with application to climate data

Bayesian models based on Gaussian processes (GPs) offer a flexible framework to predict spatially distributed variables with uncertainty. But the use of nonstationary priors, often necessary for capturing complex spatial patterns, makes…

机器学习 · 统计学 2025-06-02 Gabriel V Cardoso , Mike Pereira

Multi-task Learning for Heterogeneous Data via Integrating Shared and Task-Specific Encodings

Multi-task learning (MTL) has become an essential machine learning tool for addressing multiple learning tasks simultaneously and has been effectively applied across fields such as healthcare, marketing, and biomedical research. However, to…

机器学习 · 统计学 2025-06-02 Yang Sui , Qi Xu , Yang Bai , Annie Qu

A Mathematical Perspective On Contrastive Learning

Multimodal contrastive learning is a methodology for linking different data modalities; the canonical example is linking image and text data. The methodology is typically framed as the identification of a set of encoders, one for each…

机器学习 · 统计学 2025-06-02 Ricardo Baptista , Andrew M. Stuart , Son Tran

Performative Risk Control: Calibrating Models for Reliable Deployment under Performativity

Calibrating blackbox machine learning models to achieve risk control is crucial to ensure reliable decision-making. A rich line of literature has been studying how to calibrate a model so that its predictions satisfy explicit finite-sample…

机器学习 · 统计学 2025-06-02 Victor Li , Baiting Chen , Yuzhen Mao , Qi Lei , Zhun Deng

Identifying Metric Structures of Deep Latent Variable Models

Deep latent variable models learn condensed representations of data that, hopefully, reflect the inner workings of the studied phenomena. Unfortunately, these latent representations are not statistically identifiable, meaning they cannot be…

机器学习 · 统计学 2025-06-02 Stas Syrota , Yevgen Zainchkovskyy , Johnny Xi , Benjamin Bloem-Reddy , Søren Hauberg

A Statistical Framework for Ranking LLM-Based Chatbots

Large language models (LLMs) have transformed natural language processing, with frameworks like Chatbot Arena providing pioneering platforms for evaluating these models. By facilitating millions of pairwise comparisons based on human…

机器学习 · 统计学 2025-06-02 Siavash Ameli , Siyuan Zhuang , Ion Stoica , Michael W. Mahoney

Robust random graph matching in Gaussian models via vector approximate message passing

In this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation…

机器学习 · 统计学 2025-06-02 Zhangsong Li

Multi-marginal Schr\"odinger Bridges with Iterative Reference Refinement

Practitioners often aim to infer an unobserved population trajectory using sample snapshots at multiple time points. E.g., given single-cell sequencing data, scientists would like to learn how gene expression changes over a cell's life…

机器学习 · 统计学 2025-06-02 Yunyi Shen , Renato Berlinghieri , Tamara Broderick

Addressing Misspecification in Simulation-based Inference through Data-driven Calibration

Driven by steady progress in deep generative modeling, simulation-based inference (SBI) has emerged as the workhorse for inferring the parameters of stochastic simulators. However, recent work has demonstrated that model misspecification…

机器学习 · 统计学 2025-06-02 Antoine Wehenkel , Juan L. Gamella , Ozan Sener , Jens Behrmann , Guillermo Sapiro , Jörn-Henrik Jacobsen , Marco Cuturi