机器学习 — Scifaro

Membership Inference Attacks with False Discovery Rate Control

Recent studies have shown that deep learning models are vulnerable to membership inference attacks (MIAs), which aim to infer whether a data record was used to train a target model or not. To analyze and study these vulnerabilities, various…

机器学习 · 统计学 2025-08-12 Chenxu Zhao , Wei Qian , Aobo Chen , Mengdi Huai

Statistical Inference for Autoencoder-based Anomaly Detection after Representation Learning-based Domain Adaptation

Anomaly detection (AD) plays a vital role across a wide range of domains, but its performance might deteriorate when applied to target domains with limited data. Domain Adaptation (DA) offers a solution by transferring knowledge from a…

机器学习 · 统计学 2025-08-12 Tran Tuan Kiet , Nguyen Thang Loi , Vo Nguyen Le Duy

Federated Online Learning for Heterogeneous Multisource Streaming Data

Federated learning has emerged as an essential paradigm for distributed multi-source data analysis under privacy concerns. Most existing federated learning methods focus on the ``static" datasets. However, in many real-world applications,…

机器学习 · 统计学 2025-08-12 Jingmao Li , Yuanxing Chen , Shuangge Ma , Kuangnan Fang

Hedging with memory: shallow and deep learning with signatures

We investigate the use of path signatures in a machine learning context for hedging exotic derivatives under non-Markovian stochastic volatility models. In a deep learning setting, we use signatures as features in feedforward neural…

机器学习 · 统计学 2025-08-12 Eduardo Abi Jaber , Louis-Amand Gérard

Optimal and Practical Batched Linear Bandit Algorithm

We study the linear bandit problem under limited adaptivity, known as the batched linear bandit. While existing approaches can achieve near-optimal regret in theory, they are often computationally prohibitive or underperform in practice. We…

机器学习 · 统计学 2025-08-12 Sanghoon Yu , Min-hwan Oh

Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting

Subseasonal-to-seasonal (S2S) forecasting, which predicts climate conditions from several weeks to months in advance, represents a critical frontier for agricultural planning, energy management, and disaster preparedness. However, it…

机器学习 · 统计学 2025-08-12 Tengfei Lyu , Weijia Zhang , Hao Liu

A Theory of Learning with Autoregressive Chain of Thought

For a given base class of sequence-to-next-token generators, we consider learning prompt-to-answer mappings obtained by iterating a fixed, time-invariant generator for multiple steps, thus generating a chain-of-thought, and then taking the…

机器学习 · 统计学 2025-08-12 Nirmit Joshi , Gal Vardi , Adam Block , Surbhi Goel , Zhiyuan Li , Theodor Misiakiewicz , Nathan Srebro

Pairwise Markov Chains for Volatility Forecasting

The Pairwise Markov Chain (PMC) is a probabilistic graphical model extending the well-known Hidden Markov Model. This model, although highly effective for many tasks, has been scarcely utilized for continuous value prediction. This is…

机器学习 · 统计学 2025-08-12 Elie Azeraf

Tensor Decomposition with Unaligned Observations

This paper presents a canonical polyadic (CP) tensor decomposition that addresses unaligned observations. The mode with unaligned observations is represented using functions in a reproducing kernel Hilbert space (RKHS). We introduce a…

机器学习 · 统计学 2025-08-12 Runshi Tang , Tamara Kolda , Anru R. Zhang

A variational Bayes approach to debiased inference for low-dimensional parameters in high-dimensional linear regression

We propose a scalable variational Bayes method for statistical inference for a single or low-dimensional subset of the coordinates of a high-dimensional parameter in sparse linear regression. Our approach relies on assigning a mean-field…

机器学习 · 统计学 2025-08-12 Ismaël Castillo , Alice L'Huillier , Kolyan Ray , Luke Travis

Convergence Analysis for General Probability Flow ODEs of Diffusion Models in Wasserstein Distances

Score-based generative modeling with probability flow ordinary differential equations (ODEs) has achieved remarkable success in a variety of applications. While various fast ODE-based samplers have been proposed in the literature and…

机器学习 · 统计学 2025-08-12 Xuefeng Gao , Lingjiong Zhu

Skew-Probabilistic Neural Networks for Learning from Imbalanced Data

Real-world datasets often exhibit imbalanced data distribution, where certain class levels are severely underrepresented. In such cases, traditional pattern classifiers have shown a bias towards the majority class, impeding accurate…

机器学习 · 统计学 2025-08-12 Shraddha M. Naik , Tanujit Chakraborty , Madhurima Panja , Abdenour Hadid , Bibhas Chakraborty

Thompson Exploration with Best Challenger Rule in Best Arm Identification

This paper studies the fixed-confidence best arm identification (BAI) problem in the bandit framework in the canonical single-parameter exponential models. For this problem, many policies have been proposed, but most of them require solving…

机器学习 · 统计学 2025-08-12 Jongyeong Lee , Junya Honda , Masashi Sugiyama

Feature importance (FI) statistics provide a prominent and valuable method of insight into the decision process of machine learning (ML) models, but their effectiveness has well-known limitations when correlation is present among the…

机器学习 · 统计学 2025-08-11 Benedikt Fröhlich , Alison Durst , Merle Behr

Lightweight Auto-bidding based on Traffic Prediction in Live Advertising

Internet live streaming is widely used in online entertainment and e-commerce, where live advertising is an important marketing tool for anchors. An advertising campaign hopes to maximize the effect (such as conversions) under constraints…

机器学习 · 统计学 2025-08-11 Bo Yang , Ruixuan Luo , Junqi Jin , Han Zhu

Stochastic Trace Optimization of Parameter Dependent Matrices Based on Statistical Learning Theory

We consider matrices $\boldsymbol{A}(\boldsymbol\theta)\in\mathbb{R}^{m\times m}$ that depend, possibly nonlinearly, on a parameter $\boldsymbol\theta$ from a compact parameter space $\Theta$. We present a Monte Carlo estimator for…

机器学习 · 统计学 2025-08-11 Arvind K. Saibaba , Ilse C. F. Ipsen

Reduction Techniques for Survival Analysis

In this work, we discuss what we refer to as reduction techniques for survival analysis, that is, techniques that "reduce" a survival task to a more common regression or classification task, without ignoring the specifics of survival data.…

机器学习 · 统计学 2025-08-11 Johannes Piller , Léa Orsini , Simon Wiegrebe , John Zobolas , Lukas Burk , Sophie Hanna Langbein , Philip Studener , Markus Goeswein , Andreas Bender

L1-Regularized Functional Support Vector Machine

In functional data analysis, binary classification with one functional covariate has been extensively studied. We aim to fill in the gap of considering multivariate functional covariates in classification. In particular, we propose an…

机器学习 · 统计学 2025-08-11 Bingfan Liu , Peijun Sang

Optimal sampling for least-squares approximation

Least-squares approximation is one of the most important methods for recovering an unknown function from data. While in many applications the data is fixed, in many others there is substantial freedom to choose where to sample. In this…

机器学习 · 统计学 2025-08-11 Ben Adcock

Generalization Bound for Diffusion Models using Random Features

Diffusion probabilistic models have been successfully used to generate data from noise. However, most diffusion models are computationally expensive and difficult to interpret with a lack of theoretical justification. Random feature models…

机器学习 · 统计学 2025-08-11 Esha Saha , Giang Tran