机器学习 — Scifaro

EERO: Early Exit with Reject Option for Efficient Classification with limited budget

The increasing complexity of advanced machine learning models requires innovative approaches to manage computational resources effectively. One such method is the Early Exit strategy, which allows for adaptive computation by providing a…

机器学习 · 统计学 2025-11-07 Florian Valade , Mohamed Hebiri , Paul Gay

Colorectal Cancer Histopathological Grading using Multi-Scale Federated Learning

Colorectal cancer (CRC) grading is a critical prognostic factor but remains hampered by inter-observer variability and the privacy constraints of multi-institutional data sharing. While deep learning offers a path to automation, centralized…

机器学习 · 统计学 2025-11-06 Md Ahasanul Arafath , Abhijit Kumar Ghosh , Md Rony Ahmed , Sabrin Afroz , Minhazul Hosen , Md Hasan Moon , Md Tanzim Reza , Md Ashad Alam

RKUM: An R Package for Robust Kernel Unsupervised Methods

RKUM is an R package developed for implementing robust kernel-based unsupervised methods. It provides functions for estimating the robust kernel covariance operator (CO) and the robust kernel cross-covariance operator (CCO) using…

机器学习 · 统计学 2025-11-06 Md Ashad Alam

Precise asymptotic analysis of Sobolev training for random feature models

Gradient information is widely useful and available in applications, and is therefore natural to include in the training of neural networks. Yet little is known theoretically about the impact of Sobolev training -- regression with both…

机器学习 · 统计学 2025-11-06 Katharine E Fisher , Matthew TC Li , Youssef Marzouk , Timo Schorlepp

Unifying Information-Theoretic and Pair-Counting Clustering Similarity

Comparing clusterings is central to evaluating unsupervised models, yet the many existing similarity measures can produce widely divergent, sometimes contradictory, evaluations. Clustering similarity measures are typically organized into…

机器学习 · 统计学 2025-11-06 Alexander J. Gates

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression…

机器学习 · 统计学 2025-11-06 Giovanni Palla , Sudarshan Babu , Payam Dibaeinia , James D. Pearce , Donghui Li , Aly A. Khan , Theofanis Karaletsos , Jakub M. Tomczak

Bridging the Gap between Empirical Welfare Maximization and Conditional Average Treatment Effect Estimation in Policy Learning

The goal of policy learning is to train a policy function that recommends a treatment given covariates to maximize population welfare. There are two major approaches in policy learning: the empirical welfare maximization (EWM) approach and…

机器学习 · 统计学 2025-11-06 Masahiro Kato

Using latent representations to link disjoint longitudinal data for mixed-effects regression

Many rare diseases offer limited established treatment options, leading patients to switch therapies when new medications emerge. To analyze the impact of such treatment switches within the low sample size limitations of rare disease…

机器学习 · 统计学 2025-11-06 Clemens Schächter , Maren Hackenberg , Michelle Pfaffenlehner , Félix B. Tambe-Ndonfack , Thorsten Schmidt , Astrid Pechmann , Janbernd Kirschner , Jan Hasenauer , Harald Binder

Orthogonal Nonnegative Matrix Factorization with the Kullback-Leibler divergence

Orthogonal nonnegative matrix factorization (ONMF) has become a standard approach for clustering. As far as we know, most works on ONMF rely on the Frobenius norm to assess the quality of the approximation. This paper presents a new model…

机器学习 · 统计学 2025-11-06 Jean Pacifique Nkurunziza , Fulgence Nahayo , Nicolas Gillis

Risk and cross validation in ridge regression with correlated samples

Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free…

机器学习 · 统计学 2025-11-06 Alexander Atanasov , Jacob A. Zavatone-Veth , Cengiz Pehlevan

Variable Selection in Maximum Mean Discrepancy for Interpretable Distribution Comparison

We study two-sample variable selection: identifying variables that discriminate between the distributions of two sets of data vectors. Such variables help scientists understand the mechanisms behind dataset discrepancies. Although…

机器学习 · 统计学 2025-11-06 Kensuke Mitsuzawa , Motonobu Kanagawa , Stefano Bortoli , Margherita Grossi , Paolo Papotti

Optimizing Kernel Discrepancies via Subset Selection

Kernel discrepancies are a powerful tool for analyzing worst-case errors in quasi-Monte Carlo (QMC) methods. Building on recent advances in optimizing such discrepancy measures, we extend the subset selection problem to the setting of…

机器学习 · 统计学 2025-11-05 Deyao Chen , François Clément , Carola Doerr , Nathan Kirk

An Adaptive Sampling Framework for Detecting Localized Concept Drift under Label Scarcity

Concept drift and label scarcity are two critical challenges limiting the robustness of predictive models in dynamic industrial environments. Existing drift detection methods often assume global shifts and rely on dense supervision, making…

机器学习 · 统计学 2025-11-05 Junghee Pyeon , Davide Cacciarelli , Kamran Paynabar

A new class of Markov random fields enabling lightweight sampling

This work addresses the problem of efficient sampling of Markov random fields (MRF). The sampling of Potts or Ising MRF is most often based on Gibbs sampling, and is thus computationally expensive. We consider in this work how to circumvent…

机器学习 · 统计学 2025-11-05 Jean-Baptiste Courbot , Hugo Gangloff , Bruno Colicchio

Data-driven Learning of Interaction Laws in Multispecies Particle Systems with Gaussian Processes: Convergence Theory and Applications

We develop a Gaussian process framework for learning interaction kernels in multi-species interacting particle systems from trajectory data. Such systems provide a canonical setting for multiscale modeling, where simple microscopic…

机器学习 · 统计学 2025-11-05 Jinchao Feng , Charles Kulick , Sui Tang

Scalable Causal Discovery from Recursive Nonlinear Data via Truncated Basis Function Scores and Tests

Learning graphical conditional independence structures from nonlinear, continuous or mixed data is a central challenge in machine learning and the sciences, and many existing methods struggle to scale to thousands of samples or hundreds of…

机器学习 · 统计学 2025-11-05 Joseph Ramsey , Bryan Andrews , Peter Spirtes

Variance-Bounded Evaluation of Entity-Centric AI Systems Without Ground Truth: Theory and Measurement

Reliable evaluation of AI systems remains a fundamental challenge when ground truth labels are unavailable, particularly for systems generating natural language outputs like AI chat and agent systems. Many of these AI agents and systems…

机器学习 · 统计学 2025-11-05 Kaihua Ding

Partial Trace-Class Bayesian Neural Networks

Bayesian neural networks (BNNs) allow rigorous uncertainty quantification in deep learning, but often come at a prohibitive computational cost. We propose three different innovative architectures of partial trace-class Bayesian neural…

机器学习 · 统计学 2025-11-04 Arran Carter , Torben Sell

Hyper Hawkes Processes: Interpretable Models of Marked Temporal Point Processes

Foundational marked temporal point process (MTPP) models, such as the Hawkes process, often use inexpressive model families in order to offer interpretable parameterizations of event data. On the other hand, neural MTPPs models forego this…

机器学习 · 统计学 2025-11-04 Alex Boyd , Andrew Warrington , Taha Kass-Hout , Parminder Bhatia , Danica Xiao

Binary perceptron computational gap -- a parametric fl RDT view

Recent studies suggest that asymmetric binary perceptron (ABP) likely exhibits the so-called statistical-computational gap characterized with the appearance of two phase transitioning constraint density thresholds: \textbf{\emph{(i)}} the…

机器学习 · 统计学 2025-11-04 Mihailo Stojnic