机器学习 — Scifaro

Negative Dependence as a toolbox for machine learning : review and new developments

Negative dependence is becoming a key driver in advancing learning capabilities beyond the limits of traditional independence. Recent developments have evidenced support towards negatively dependent systems as a learning paradigm in a broad…

机器学习 · 统计学 2025-11-17 Hoang-Son Tran , Vladimir Petrovic , Remi Bardenet , Subhroshekhar Ghosh

Optimizing importance weighting in the presence of sub-population shifts

A distribution shift between the training and test data can severely harm performance of machine learning models. Importance weighting addresses this issue by assigning different weights to data points during training. We argue that…

机器学习 · 统计学 2025-11-17 Floris Holstege , Bram Wouters , Noud van Giersbergen , Cees Diks

Operator Models for Continuous-Time Offline Reinforcement Learning

Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often unsafe or impractical, motivating offline…

机器学习 · 统计学 2025-11-14 Nicolas Hoischen , Petar Bevanda , Max Beier , Stefan Sosnowski , Boris Houska , Sandra Hirche

Theory and computation for structured variational inference

Structured variational inference constitutes a core methodology in modern statistical applications. Unlike mean-field variational inference, the approximate posterior is assumed to have interdependent structure. We consider the natural…

机器学习 · 统计学 2025-11-14 Shunan Sheng , Bohan Wu , Bennett Zhu , Sinho Chewi , Aram-Alexandre Pooladian

Masked Mineral Modeling: Continent-Scale Mineral Prospecting via Geospatial Infilling

Minerals play a critical role in the advanced energy technologies necessary for decarbonization, but characterizing mineral deposits hidden underground remains costly and challenging. Inspired by recent progress in generative modeling, we…

机器学习 · 统计学 2025-11-14 Sujay Nair , Evan Coleman , Sherrie Wang , Elsa Olivetti

Siegel Neural Networks

Riemannian symmetric spaces (RSS) such as hyperbolic spaces and symmetric positive definite (SPD) manifolds have become popular spaces for representation learning. In this paper, we propose a novel approach for building discriminative…

机器学习 · 统计学 2025-11-14 Xuan Son Nguyen , Aymeric Histace , Nistor Grozavu

Convergence and Stability Analysis of Self-Consuming Generative Models with Heterogeneous Human Curation

Self-consuming generative models have received significant attention over the last few years. In this paper, we study a self-consuming generative model with heterogeneous preferences that is a generalization of the model in Ferbach et al.…

机器学习 · 统计学 2025-11-14 Hongru Zhao , Jinwen Fu , Tuan Pham

Variance Reduction via Resampling and Experience Replay

Experience replay is a foundational technique in reinforcement learning that enhances learning stability by storing past experiences in a replay buffer and reusing them during training. Despite its practical success, its theoretical…

机器学习 · 统计学 2025-11-14 Jiale Han , Xiaowu Dai , Yuhua Zhu

Semiparametric Double Reinforcement Learning with Applications to Long-Term Causal Inference

Double Reinforcement Learning (DRL) enables efficient inference for policy values in nonparametric Markov decision processes (MDPs), but existing methods face two major obstacles: (1) they require stringent intertemporal overlap conditions…

机器学习 · 统计学 2025-11-14 Lars van der Laan , David Hubbard , Allen Tran , Nathan Kallus , Aurélien Bibaut

Logistic Variational Bayes Revisited

Variational logistic regression is a popular method for approximate Bayesian inference seeing wide-spread use in many areas of machine learning including: Bayesian optimization, reinforcement learning and multi-instance learning to name a…

机器学习 · 统计学 2025-11-14 Michael Komodromos , Marina Evangelou , Sarah Filippi

Regression Trees Know Calculus

Regression trees have emerged as a preeminent tool for solving real-world regression problems due to their ability to deal with nonlinearities, interaction effects and sharp discontinuities. In this article, we rather study regression trees…

机器学习 · 统计学 2025-11-14 Nathan Wycoff

Provably Scalable Black-Box Variational Inference with Structured Variational Families

Variational families with full-rank covariance approximations are known not to work well in black-box variational inference (BBVI), both empirically and theoretically. In fact, recent computational complexity results for BBVI have…

机器学习 · 统计学 2025-11-14 Joohwan Ko , Kyurae Kim , Woo Chang Kim , Jacob R. Gardner

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called "linear") rate under perfect variational family specification.…

机器学习 · 统计学 2025-11-14 Kyurae Kim , Yian Ma , Jacob R. Gardner

Robust Sampling for Active Statistical Inference

Active statistical inference is a new method for inference with AI-assisted data collection. Given a budget on the number of labeled data points that can be collected and assuming access to an AI predictive model, the basic idea is to…

机器学习 · 统计学 2025-11-13 Puheng Li , Tijana Zrnic , Emmanuel Candès

Effects of label noise on the classification of outlier observations

This study investigates the impact of adding noise to the training set classes in classification tasks using the BCOPS algorithm (Balanced and Conformal Optimized Prediction Sets), proposed by Guan & Tibshirani (2022). The BCOPS algorithm…

机器学习 · 统计学 2025-11-13 Matheus Vinícius Barreto de Farias , Mario de Castro

The Probably Approximately Correct Learning Model in Computational Learning Theory

This survey paper gives an overview of various known results on learning classes of Boolean functions in Valiant's Probably Approximately Correct (PAC) learning model and its commonly studied variants.

机器学习 · 统计学 2025-11-13 Rocco A. Servedio

Lassoed Forests: Random Forests with Adaptive Lasso Post-selection

Random forests are a statistical learning technique that use bootstrap aggregation to average high-variance and low-bias trees. Improvements to random forests, such as applying Lasso regression to the tree predictions, have been proposed in…

机器学习 · 统计学 2025-11-13 Jing Shang , James Bannon , Benjamin Haibe-Kains , Robert Tibshirani

Self-adaptive weighting and sampling for physics-informed neural networks

Physics-informed deep learning has emerged as a promising framework for solving partial differential equations (PDEs). Nevertheless, training these models on complex problems remains challenging, often leading to limited accuracy and…

机器学习 · 统计学 2025-11-13 Wenqian Chen , Amanda Howard , Panos Stinis

Simulation-based inference of yeast centromeres

The chromatin folding and the spatial arrangement of chromosomes in the cell play a crucial role in DNA replication and genes expression. An improper chromatin folding could lead to malfunctions and, over time, diseases. For eukaryotes,…

机器学习 · 统计学 2025-11-13 Eloïse Touron , Pedro L. C. Rodrigues , Julyan Arbel , Nelle Varoquaux , Michael Arbel

Bayesian preference elicitation for decision support in multiobjective optimization

We present a novel approach to help decision-makers efficiently identify preferred solutions from the Pareto set of a multi-objective optimization problem. Our method uses a Bayesian model to estimate the decision-maker's utility function…

机器学习 · 统计学 2025-11-13 Felix Huber , Sebastian Rojas Gonzalez , Raul Astudillo