统计学 — Scifaro

Highly Data Parallelizable Estimation of the Sliced-Wasserstein Distance Using Cumulative Distribution Functions

The Sliced Wasserstein (SW) distance has emerged as a computationally attractive alternative to the Wasserstein distance by leveraging one-dimensional optimal transport along random projections. Standard estimators of the SW distance rely…

机器学习 · 统计学 2026-06-29 Christophe Vauthier , Quentin Mérigot , Anna Korba

Notes on generative modeling: flow matching, diffusion, optimal transport and Schr{ö}dinger bridge

These notes recapitulate the high level mathematical principles behind different techniques for generative modeling. I show the connections between optimal transport and standard techniques such as Schr{\"o}dinger bridge and flow matching.

机器学习 · 统计学 2026-06-29 Titouan Vayer

Beyond Equidistant Assumptions: An Autoregressive Ordered Stereotype Model for Ordinal Time Series

We propose an extension of the ordered stereotype model (OSM) for ordinal time series data, referred to as the Autoregressive OSM (AR-OSM). The model captures serial dependence by incorporating lagged values of the response as covariates in…

统计方法学 · 统计学 2026-06-29 Anna Nalpantidi , Dimitris Karlis , Daniel Fernández

Scalable coarse-to-fine spatial downscaling

This study proposes coarse-to-fine downscaling (CF-DS), a scalable spatial downscaling method extending coarse-to-fine spatial modeling. Unlike conventional spatial-statistical downscaling methods such as area-to-point kriging, CF-DS does…

统计方法学 · 统计学 2026-06-29 Daisuke Murakami , Yongwan Chun , Takahiro Yoshida , Hajime Seya

HERO: Improving the Reliability and Sensitivity of Generative Model Evaluation Using Historical Data

Reliable generative AI models critically rely on expert human annotations to evaluate output quality, yet these "gold" labels are expensive to collect and limited in quantity. Organizations thus often turn to collecting vast but noisy…

统计方法学 · 统计学 2026-06-29 Xinrui Ruan , Zhenyu Zhao , Waverly Wei , Yueshan Zhang , Zeyu Zheng , Sui Huang , Jingshen Wang

Probing the Stochastic Machine: Engaging with LLMs in Statistics Curricula Through Veridical Data Science

Large language models (LLMs) are interactive stochastic systems whose most consequential behaviors are still only partially understood. This discussion argues that statistics curricula should treat LLMs not only as tools, but as objects of…

应用统计 · 统计学 2026-06-29 Tian Zheng

Testing hypotheses via orthogonalization

Classical hypothesis testing frameworks break down in contemporary settings in which null hypotheses are increasingly abstract, the same data are used to both generate and test hypotheses, and minimal assumptions about the underlying data…

统计方法学 · 统计学 2026-06-29 Ameer Dharamshi , Runjia Zou , Daniela Witten

Adjusted Wasserstein distances for bridging empirical and true distributions with applications to MDS

This paper examines how metric adjustments to Multidimensional Scaling (MDS) can enhance its effectiveness as a visual tool for pattern recognition. The distance under consideration, referred to as Max-D-SW, is an adjustment of the…

机器学习 · 统计学 2026-06-29 Flor Martinez-Sermeno , Arturo Jaramillo , Johan Van Horebeek

Multi-Source Transfer Learning of Sparse Single-Index Models

Transfer learning leverages knowledge from related source domains to improve learning in a target domain. Recent theoretical advances cover a broad range of regression settings within (generalized) linear models. Despite their diversity,…

统计方法学 · 统计学 2026-06-28 Ye Tian

Beyond Local Independence: High-Dimensional Latent Class Graphical Models with Shared Block Structure

Latent class models are central tools for multivariate categorical data from heterogeneous populations, but their standard local-independence assumption is often unrealistic in modern high-dimensional applications. We propose a…

统计方法学 · 统计学 2026-06-28 Seunghyun Lee , Yuqi Gu

Bidirectional Autoregressive Latent Diffusion for Forward and Inverse Magnetohydrodynamics

This work presents a new bidirectional autoregressive latent diffusion approach for predicting the evolution of multiple fields (mass density, pressure, velocity, and magnetic field components) for magnetohydrodynamics. We show that this…

机器学习 · 统计学 2026-06-28 Alexander Scheinker

Modelling and detecting mild and gross anomalies in circular data via double-contaminated models

In this paper, we propose a model-based framework to robustify inference for circular data in the presence of anomalous observations, distinguishing between mild and gross anomalies. Starting from a unimodal and symmetric reference model on…

统计方法学 · 统计学 2026-06-28 Antonio Punzo , Andriëtte Bekker , Arno Otto , Priyanka Nagar , Cristina Tortora

Scalable Bayesian Spatial Mixture Modelling for Remote Sensing Image Segmentation

Accurate and scalable land cover classification is essential for global conservation monitoring and policy-making. While remote sensing images provide a cost-effective alternative to ground surveys, current methods often lack principled…

统计方法学 · 统计学 2026-06-28 Bao Khanh Nguyen , Iain Cameron , Cecilia Balocchi , Torben Sell

Self-Organized Conformal Prediction: Reducing Regional Coverage Gaps with Unsupervised Group Discovery

Conformal prediction guarantees marginal coverage, but pooled calibration averages over heterogeneous regions and can mask regional undercoverage in safety-critical subgroups. We introduce Self-Organized Conformal Prediction (SOCP), a…

机器学习 · 统计学 2026-06-28 Louis Berthier , Ahmed Shokry , Maxime Moreaud , Guillaume Ramelet , Aymeric Dieuleveut

Bayesian Copula Directional Dependence is Cross-Network Robust for Gene-Regulatory Pair Direction: A Benchmark Study on DREAM5

Inferring the direction of a gene-regulatory relationship is harder than inferring whether a relationship exists, and most direction-inference methods are validated mainly on a single in silico benchmark. We ask which method remains…

应用统计 · 统计学 2026-06-28 Xiaoying Wei , Clara Grazian

Critique of "Use of roster charts in the investigation and prosecution of nurses ..." by John O' Quigley

The paper "Use of roster charts in the investigation and prosecution of nurses suspected of inflicting deliberate harm on patients" by Prof. John O'Quigley explores an interesting hypothesis concerning statistical information hidden in the…

应用统计 · 统计学 2026-06-28 Richard D. Gill

Semantic insurance pricing with large language models

Classical actuarial pricing models, such as the generalized linear model, are valued for transparency and ease of governance, but they use interactions among risk factors only when these are supplied through explicit feature engineering. We…

应用统计 · 统计学 2026-06-28 Christopher Blier-Wong , Derek Kusmenko

Gradient boosting with vector-valued leafs

Gradient boosting in the form of decision tree ensembles has successfully been applied to a variety of problems using simple objective functions based on log-likelihoods of a single variable. The concept extends naturally to objective…

机器学习 · 统计学 2026-06-28 David Cortes

Generalization Analysis of Transformers in Distribution Regression

In recent years, models based on the Transformer architecture have seen widespread applications and have become one of the core tools in the field of deep learning. Numerous successful techniques, such as parameter-efficient fine-tuning and…

机器学习 · 统计学 2026-06-28 Peilin Liu , Ding-Xuan Zhou

Using Variational Inference to Improve the Efficiency of MCMC Algorithms

Bayesian statistics makes inference based on Bayes' theorem, but the posterior distribution of unknown parameters is typically analytically intractable. To estimate the posterior, two widely used numerical approximation methods are Markov…

统计计算 · 统计学 2026-06-28 Pingping Yin , Xiyun Jiao