机器学习 — Scifaro

Learning Counterfactual Distributions via Kernel Nearest Neighbors

Consider a setting with multiple units (e.g., individuals, cohorts, geographic locations) and outcomes (e.g., treatments, times, items), where the goal is to learn a multivariate distribution for each unit-outcome entry, such as the…

机器学习 · 统计学 2025-10-21 Kyuseong Choi , Jacob Feitelberg , Caleb Chin , Anish Agarwal , Raaz Dwivedi

Adv-SSL: Adversarial Self-Supervised Representation Learning with Theoretical Guarantees

Learning transferable data representations from abundant unlabeled data remains a central challenge in machine learning. Although numerous self-supervised learning methods have been proposed to address this challenge, a significant class of…

机器学习 · 统计学 2025-10-21 Chenguang Duan , Yuling Jiao , Huazhen Lin , Wensen Ma , Jerry Zhijian Yang

Conformal online model aggregation

Conformal prediction equips machine learning models with a reasonable notion of uncertainty quantification without making strong distributional assumptions. It wraps around any prediction model and converts point predictions into set…

机器学习 · 统计学 2025-10-21 Matteo Gasparin , Aaditya Ramdas

Variational Inference for Uncertainty Quantification: an Analysis of Trade-offs

Given an intractable distribution $p$, the problem of variational inference (VI) is to find the best approximation from some more tractable family $Q$. Commonly, one chooses $Q$ to be a family of factorized distributions (i.e., the…

机器学习 · 统计学 2025-10-21 Charles C. Margossian , Loucas Pillaud-Vivien , Lawrence K. Saul

Blackwell's Approachability for Sequential Conformal Inference

We study conformal inference in non-exchangeable environments through the lens of Blackwell's theory of approachability. We first recast adaptive conformal inference (ACI, Gibbs and Cand\`es, 2021) as a repeated two-player vector-valued…

机器学习 · 统计学 2025-10-20 Guillaume Principato , Gilles Stoltz

On Universality of Deep Equivariant Networks

Universality results for equivariant neural networks remain rare. Those that do exist typically hold only in restrictive settings: either they rely on regular or higher-order tensor representations, leading to impractically high-dimensional…

机器学习 · 统计学 2025-10-20 Marco Pacini , Mircea Petrache , Bruno Lepri , Shubhendu Trivedi , Robin Walters

Kernel-Based Evaluation of Conditional Biological Sequence Models

We propose a set of kernel-based tools to evaluate the designs and tune the hyperparameters of conditional sequence models, with a focus on problems in computational biology. The backbone of our tools is a new measure of discrepancy between…

机器学习 · 统计学 2025-10-20 Pierre Glaser , Steffanie Paul , Alissa M. Hummer , Charlotte M. Deane , Debora S. Marks , Alan N. Amin

Geometric Convergence Analysis of Variational Inference via Bregman Divergences

Variational Inference (VI) provides a scalable framework for Bayesian inference by optimizing the Evidence Lower Bound (ELBO), but convergence analysis remains challenging due to the objective's non-convexity and non-smoothness in Euclidean…

机器学习 · 统计学 2025-10-20 Sushil Bohara , Amedeo Roberto Esposito

Robust Optimization in Causal Models and G-Causal Normalizing Flows

In this paper, we show that interventionally robust optimization problems in causal models are continuous under the $G$-causal Wasserstein distance, but may be discontinuous under the standard Wasserstein distance. This highlights the…

机器学习 · 统计学 2025-10-20 Gabriele Visentin , Patrick Cheridito

Information Theory in Open-world Machine Learning Foundations, Frameworks, and Future Direction

Open world Machine Learning (OWML) aims to develop intelligent systems capable of recognizing known categories, rejecting unknown samples, and continually learning from novel information. Despite significant progress in open set…

机器学习 · 统计学 2025-10-20 Lin Wang

Recursive Inference for Heterogeneous Multi-Output GP State-Space Models with Arbitrary Moment Matching

Accurate learning of system dynamics is becoming increasingly crucial for advanced control and decision-making in engineering. However, real-world systems often exhibit multiple channels and highly nonlinear transition dynamics, challenging…

机器学习 · 统计学 2025-10-20 Tengjie Zheng , Jilan Mei , Di Wu , Lin Cheng , Shengping Gong

Kernel Regression in Structured Non-IID Settings: Theory and Implications for Denoising Score Learning

Kernel ridge regression (KRR) is a foundational tool in machine learning, with recent work emphasizing its connections to neural networks. However, existing theory primarily addresses the i.i.d. setting, while real-world data often exhibits…

机器学习 · 统计学 2025-10-20 Dechen Zhang , Zhenmei Shi , Yi Zhang , Yingyu Liang , Difan Zou

RankSEG-RMA: An Efficient Segmentation Algorithm via Reciprocal Moment Approximation

Semantic segmentation labels each pixel in an image with its corresponding class, and is typically evaluated using the Intersection over Union (IoU) and Dice metrics to quantify the overlap between predicted and ground-truth segmentation…

机器学习 · 统计学 2025-10-20 Zixun Wang , Ben Dai

Foresighted Online Policy Optimization with Interference

Contextual bandits, which leverage the baseline features of sequentially arriving individuals to optimize cumulative rewards while balancing exploration and exploitation, are critical for online decision-making. Existing approaches…

机器学习 · 统计学 2025-10-20 Liner Xiang , Jiayi Wang , Hengrui Cai

The Tree-SNE Tree Exists

The clustering and visualisation of high-dimensional data is a ubiquitous task in modern data science. Popular techniques include nonlinear dimensionality reduction methods like t-SNE or UMAP. These methods face the `scale-problem' of…

机器学习 · 统计学 2025-10-20 Jack Kendrick

Reliable data clustering with Bayesian community detection

From neuroscience and genomics to systems biology and ecology, researchers rely on clustering similarity data to uncover modular structure. Yet widely used clustering methods, such as hierarchical clustering, k-means, and WGCNA, lack…

机器学习 · 统计学 2025-10-20 Magnus Neuman , Jelena Smiljanić , Martin Rosvall

From Universal Approximation Theorem to Tropical Geometry of Multi-Layer Perceptrons

We revisit the Universal Approximation Theorem(UAT) through the lens of the tropical geometry of neural networks and introduce a constructive, geometry-aware initialization for sigmoidal multi-layer perceptrons (MLPs). Tropical geometry…

机器学习 · 统计学 2025-10-20 Yi-Shan Chu , Yueh-Cheng Kuo

Flow Matching for Robust Simulation-Based Inference under Model Misspecification

Simulation-based inference (SBI) is transforming experimental sciences by enabling parameter estimation in complex non-linear models from simulated data. A persistent challenge, however, is model misspecification: simulators are only…

机器学习 · 统计学 2025-10-20 Pierre-Louis Ruhlmann , Pedro L. C. Rodrigues , Michael Arbel , Florence Forbes

Uncertainty Quantification for Physics-Informed Neural Networks with Extended Fiducial Inference

Uncertainty quantification (UQ) in scientific machine learning is increasingly critical as neural networks are widely adopted to tackle complex problems across diverse scientific disciplines. For physics-informed neural networks (PINNs), a…

机器学习 · 统计学 2025-10-20 Frank Shih , Zhenghao Jiang , Faming Liang

Landmark-Based Node Representations for Shortest Path Distance Approximations in Random Graphs

Learning node representations is a fundamental problem in graph machine learning. While existing embedding methods effectively preserve local similarity measures, they often fail to capture global functions like graph distances. Inspired by…

机器学习 · 统计学 2025-10-20 My Le , Luana Ruiz , Souvik Dhara