机器学习 — Scifaro

Conformal inference for regression on Riemannian Manifolds

Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for non Euclidean data. Circular data is a classic example, but so is data in…

机器学习 · 统计学 2025-07-18 Alejandro Cholaquidis , Fabrice Gamboa , Leonardo Moreno

Incorporating Fairness Constraints into Archetypal Analysis

Archetypal Analysis (AA) is an unsupervised learning method that represents data as convex combinations of extreme patterns called archetypes. While AA provides interpretable and low-dimensional representations, it can inadvertently encode…

机器学习 · 统计学 2025-07-17 Aleix Alcacer , Irene Epifanio

Newfluence: Boosting Model interpretability and Understanding in High Dimensions

The increasing complexity of machine learning (ML) and artificial intelligence (AI) models has created a pressing need for tools that help scientists, engineers, and policymakers interpret and refine model decisions and predictions.…

机器学习 · 统计学 2025-07-17 Haolin Zou , Arnab Auddy , Yongchan Kwon , Kamiar Rahnama Rad , Arian Maleki

From Observational Data to Clinical Recommendations: A Causal Framework for Estimating Patient-level Treatment Effects and Learning Policies

We propose a framework for building patient-specific treatment recommendation models, building on the large recent literature on learning patient-level causal models and inspired by the target trial paradigm of Hernan and Robins. We focus…

机器学习 · 统计学 2025-07-17 Rom Gutman , Shimon Sheiba , Omer Noy Klein , Naama Dekel Bird , Amit Gruber , Doron Aronson , Oren Caspi , Uri Shalit

Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction

Ensuring model calibration is critical for reliable prediction, yet popular distribution-free methods such as histogram binning and isotonic regression offer only asymptotic guarantees. We introduce a unified framework for Venn and…

机器学习 · 统计学 2025-07-17 Lars van der Laan , Ahmed Alaa

On the Statistical Properties of Generative Adversarial Models for Low Intrinsic Data Dimension

Despite the remarkable empirical successes of Generative Adversarial Networks (GANs), the theoretical guarantees for their statistical accuracy remain rather pessimistic. In particular, the data distributions on which GANs are applied, such…

机器学习 · 统计学 2025-07-17 Saptarshi Chakraborty , Peter L. Bartlett

Prominent Roles of Conditionally Invariant Components in Domain Adaptation: Theory and Algorithms

Domain adaptation (DA) is a statistical learning problem that arises when the distribution of the source data used to train a model differs from that of the target data used to evaluate the model. While many DA algorithms have demonstrated…

机器学习 · 统计学 2025-07-17 Keru Wu , Yuansi Chen , Wooseok Ha , Bin Yu

Joint space-time wind field data extrapolation and uncertainty quantification using nonparametric Bayesian dictionary learning

A methodology is developed, based on nonparametric Bayesian dictionary learning, for joint space-time wind field data extrapolation and estimation of related statistics by relying on limited/incomplete measurements. Specifically, utilizing…

机器学习 · 统计学 2025-07-16 George D. Pasparakis , Ioannis A. Kougioumtzoglou , Michael D. Shields

Interpretable Bayesian Tensor Network Kernel Machines with Automatic Rank and Feature Selection

Tensor Network (TN) Kernel Machines speed up model learning by representing parameters as low-rank TNs, reducing computation and memory use. However, most TN-based Kernel methods are deterministic and ignore parameter uncertainty. Further,…

机器学习 · 统计学 2025-07-16 Afra Kilic , Kim Batselier

GOLFS: Feature Selection via Combining Both Global and Local Information for High Dimensional Clustering

It is important to identify the discriminative features for high dimensional clustering. However, due to the lack of cluster labels, the regularization methods developed for supervised feature selection can not be directly applied. To learn…

机器学习 · 统计学 2025-07-16 Zhaoyu Xing , Yang Wan , Juan Wen , Wei Zhong

Robust Multi-Manifold Clustering via Simplex Paths

This article introduces a novel, geometric approach for multi-manifold clustering (MMC), i.e. for clustering a collection of potentially intersecting, d-dimensional manifolds into the individual manifold components. We first compute a…

机器学习 · 统计学 2025-07-16 Haoyu Chen , Anna Little , Akin Narayan

Universal rates of ERM for agnostic learning

The universal learning framework has been developed to obtain guarantees on the learning rates that hold for any fixed distribution, which can be much faster than the ones uniformly hold over all the distributions. Given that the Empirical…

机器学习 · 统计学 2025-07-16 Steve Hanneke , Mingyue Xu

State-Constrained Offline Reinforcement Learning

Traditional offline reinforcement learning (RL) methods predominantly operate in a batch-constrained setting. This confines the algorithms to a specific state-action distribution present in the dataset, reducing the effects of…

机器学习 · 统计学 2025-07-16 Charles A. Hepburn , Yue Jin , Giovanni Montana

Simulating Biases for Interpretable Fairness in Offline and Online Classifiers

Predictive models often reinforce biases which were originally embedded in their training data, through skewed decisions. In such cases, mitigation methods are critical to ensure that, regardless of the prevailing disparities, model…

机器学习 · 统计学 2025-07-15 Ricardo Inácio , Zafeiris Kokkinogenis , Vitor Cerqueira , Carlos Soares

Discovering Governing Equations in the Presence of Uncertainty

In the study of complex dynamical systems, understanding and accurately modeling the underlying physical processes is crucial for predicting system behavior and designing effective interventions. Yet real-world systems exhibit pronounced…

机器学习 · 统计学 2025-07-15 Ridwan Olabiyi , Han Hu , Ashif Iquebal

Signed Graph Learning: Algorithms and Theory

Real-world data is often represented through the relationships between data samples, forming a graph structure. In many applications, it is necessary to learn this graph structure from the observed data. Current graph learning research has…

机器学习 · 统计学 2025-07-15 Abdullah Karaaslanli , Bisakh Banerjee , Tapabrata Maiti , Selin Aviyente

An Algorithm for Identifying Interpretable Subgroups With Elevated Treatment Effects

We introduce an algorithm for identifying interpretable subgroups with elevated treatment effects, given an estimate of individual or conditional average treatment effects (CATE). Subgroups are characterized by ``rule sets'' --…

机器学习 · 统计学 2025-07-15 Albert Chiu

Uncovering symmetric and asymmetric species associations from community and environmental data

There is no much doubt that biotic interactions shape community assembly and ultimately the spatial co-variations between species. There is a hope that the signal of these biotic interactions can be observed and retrieved by investigating…

机器学习 · 统计学 2025-07-15 Sara Si-Moussi , Esther Galbrun , Mickael Hedde , Giovanni Poggiato , Matthias Rohr , Wilfried Thuiller

CoVAE: Consistency Training of Variational Autoencoders

Current state-of-the-art generative approaches frequently rely on a two-stage training procedure, where an autoencoder (often a VAE) first performs dimensionality reduction, followed by training a generative model on the learned latent…

机器学习 · 统计学 2025-07-15 Gianluigi Silvestri , Luca Ambrogioni

Fixed-Confidence Multiple Change Point Identification under Bandit Feedback

Piecewise constant functions describe a variety of real-world phenomena in domains ranging from chemistry to manufacturing. In practice, it is often required to confidently identify the locations of the abrupt changes in these functions as…

机器学习 · 统计学 2025-07-15 Joseph Lazzaro , Ciara Pike-Burke