机器学习 — Scifaro

Minimax Optimal Two-Sample Testing under Local Differential Privacy

We explore the trade-off between privacy and statistical utility in private two-sample testing under local differential privacy (LDP) for both multinomial and continuous data. We begin by addressing the multinomial case, where we introduce…

机器学习 · 统计学 2025-12-30 Jongmin Mun , Seungwoo Kwak , Ilmun Kim

Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse

Consumer protection rules require companies that deploy models to automate decisions in high-stakes settings to explain predictions to decision subjects. These rules are motivated, in part, by the belief that explanations can promote…

机器学习 · 统计学 2025-12-30 Harry Cheon , Anneke Wernerfelt , Sorelle A. Friedler , Berk Ustun

Tilt Matching for Scalable Sampling and Fine-Tuning

We propose a simple, scalable algorithm for using stochastic interpolants to sample from unnormalized densities and for fine-tuning generative models. The approach, Tilt Matching, arises from a dynamical equation relating the flow matching…

机器学习 · 统计学 2025-12-29 Peter Potaptchik , Cheuk-Kit Lee , Michael S. Albergo

Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models

Diffusion models have become a central tool in deep generative modeling, but standard formulations rely on a single network and a single diffusion schedule to transform a simple prior, typically a standard normal distribution, into the…

机器学习 · 统计学 2025-12-29 Takuro Kutsuna

Contextual Strongly Convex Simulation Optimization: Optimize then Predict with Inexact Solutions

In this work, we study contextual strongly convex simulation optimization and adopt an "optimize then predict" (OTP) approach for real-time decision making. In the offline stage, simulation optimization is conducted across a set of…

机器学习 · 统计学 2025-12-29 Nifei Lin , Heng Luo , L. Jeff Hong

Informative missingness and its implications in semi-supervised learning

Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data. It leverages information from labelled samples, whose acquisition is often costly or labour-intensive, together with unlabelled data to enhance…

机器学习 · 统计学 2025-12-29 Jinran Wu , You-Gan Wang , Geoffrey J. McLachlan

Gaussian Process Regression -- Neural Network Hybrid with Optimized Redundant Coordinates

Recently, a Gaussian Process Regression - neural network (GPRNN) hybrid machine learning method was proposed, which is based on additive-kernel GPR in redundant coordinates constructed by rules [J. Phys. Chem. A 127 (2023) 7823]. The method…

机器学习 · 统计学 2025-12-29 Sergei Manzhos , Manabu Ihara

Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models

Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on…

机器学习 · 统计学 2025-12-29 Ye Tian , Haolei Weng , Lucy Xia , Yang Feng

Learning pairwise Markov network structures using correlation neighborhoods

Markov networks are widely studied and used throughout multivariate statistics and computer science. In particular, the problem of learning the structure of Markov networks from data without invoking chordality assumptions in order to…

机器学习 · 统计学 2025-12-29 Juri Kuronen , Jukka Corander , Johan Pensar

A Primer on the Signature Method in Machine Learning

We provide an introduction to the signature method, focusing on its theoretical properties and machine learning applications. Our presentation is divided into two parts. In the first part, we present the definition and fundamental…

机器学习 · 统计学 2025-12-29 Ilya Chevyrev , Andrey Kormilitzin

Causal-driven attribution (CDA): Estimating channel influence without user-level data

Attribution modelling lies at the heart of marketing effectiveness, yet most existing approaches depend on user-level path data, which are increasingly inaccessible due to privacy regulations and platform restrictions. This paper introduces…

机器学习 · 统计学 2025-12-25 Georgios Filippou , Boi Mai Quach , Diana Lenghel , Arthur White , Ashish Kumar Jha

Enhancing diffusion models with Gaussianization preprocessing

Diffusion models are a class of generative models that have demonstrated remarkable success in tasks such as image generation. However, one of the bottlenecks of these models is slow sampling due to the delay before the onset of trajectory…

机器学习 · 统计学 2025-12-25 Li Cunzhi , Louis Kang , Hideaki Shimazaki

Weighted MCC: A Robust Measure of Multiclass Classifier Performance for Observations with Individual Weights

Several performance measures are used to evaluate binary and multiclass classification tasks. But individual observations may often have distinct weights, and none of these measures are sensitive to such varying weights. We propose a new…

机器学习 · 统计学 2025-12-25 Rommel Cortez , Bala Krishnamoorthy

Fast and Exact Least Absolute Deviations Line Fitting via Piecewise Affine Lower-Bounding

Least-absolute-deviations (LAD) line fitting is robust to outliers but computationally more involved than least squares regression. Although the literature includes linear and near-linear time algorithms for the LAD line fitting problem,…

机器学习 · 统计学 2025-12-25 Stefan Volz , Martin Storath , Andreas Weinmann

Optimal Model Selection for Conformalized Robust Optimization

In decision-making under uncertainty, Contextual Robust Optimization (CRO) provides reliability by minimizing the worst-case decision loss over a prediction set. While recent advances use conformal prediction to construct prediction sets…

机器学习 · 统计学 2025-12-25 Yajie Bao , Yang Hu , Haojie Ren , Peng Zhao , Changliang Zou

Learning Enhanced Ensemble Filters

The filtering distribution in hidden Markov models evolves according to the law of a mean-field model in state-observation space. The ensemble Kalman filter (EnKF) approximates this mean-field model with an ensemble of interacting…

机器学习 · 统计学 2025-12-25 Eviatar Bach , Ricardo Baptista , Edoardo Calvello , Bohan Chen , Andrew Stuart

Neural Dynamic Data Valuation: A Stochastic Optimal Control Approach

Data valuation has become a cornerstone of the modern data economy, where datasets function as tradable intellectual assets that drive decision-making, model training, and market transactions. Despite substantial progress, existing…

机器学习 · 统计学 2025-12-25 Zhangyong Liang , Ji Zhang , Xin Wang , Pengfei Zhang , Zhao Li

Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning

We investigate a framework for robo-advisors to estimate non-expert clients' risk aversion using adaptive binary-choice questionnaires. We model risk aversion using cost functions and spectral risk measures in a static setting. We prove the…

机器学习 · 统计学 2025-12-25 Ziteng Cheng , Anthony Coache , Sebastian Jaimungal

Deep Kronecker Network

We propose Deep Kronecker Network (DKN), a novel framework designed for analyzing medical imaging data, such as MRI, fMRI, CT, etc. Medical imaging data is different from general images in at least two aspects: i) sample size is usually…

机器学习 · 统计学 2025-12-25 Long Feng , Guang Yang

Generative Bayesian Hyperparameter Tuning

\noindent Hyper-parameter selection is a central practical problem in modern machine learning, governing regularization strength, model capacity, and robustness choices. Cross-validation is often computationally prohibitive at scale, while…

机器学习 · 统计学 2025-12-24 Hedibert Lopes , Nick Polson , Vadim Sokolov