机器学习 — Scifaro

Learning Majority-to-Minority Transformations with MMD and Triplet Loss for Imbalanced Classification

Class imbalance in supervised classification often degrades model performance by biasing predictions toward the majority class, particularly in critical applications such as medical diagnosis and fraud detection. Traditional oversampling…

机器学习 · 统计学 2025-09-16 Suman Cha , Hyunjoong Kim

Contrastive Network Representation Learning

Network representation learning seeks to embed networks into a low-dimensional space while preserving the structural and semantic properties, thereby facilitating downstream tasks such as classification, trait prediction, edge…

机器学习 · 统计学 2025-09-16 Zihan Dong , Xin Zhou , Ryumei Nakada , Lexin Li , Linjun Zhang

Maximum diversity, weighting and invariants of time series

Magnitude, obtained as a special case of Euler characteristic of enriched category, represents a sense of the size of metric spaces and is related to classical notions such as cardinality, dimension, and volume. While the studies have…

机器学习 · 统计学 2025-09-16 Byungchang So

Scalable extensions to given-data Sobol' index estimators

Given-data methods for variance-based sensitivity analysis have significantly advanced the feasibility of Sobol' index computation for computationally expensive models and models with many inputs. However, the limitations of existing…

机器学习 · 统计学 2025-09-16 Teresa Portone , Bert Debusschere , Samantha Yang , Emiliano Islas-Quinones , T. Patrick Xiao

Likelihood Ratio Tests by Kernel Gaussian Embedding

We propose a novel kernel-based nonparametric two-sample test, employing the combined use of kernel mean and kernel covariance embedding. Our test builds on recent results showing how such combined embeddings map distinct probability…

机器学习 · 统计学 2025-09-16 Leonardo V. Santoro , Victor M. Panaretos

Simulation-Based Sensitivity Analysis in Optimal Treatment Regimes and Causal Decomposition with Individualized Interventions

Causal decomposition analysis aims to assess the effect of modifying risk factors on reducing social disparities in outcomes. Recently, this analysis has incorporated individual characteristics when modifying risk factors by utilizing…

机器学习 · 统计学 2025-09-16 Soojin Park , Suyeon Kang , Chioun Lee

Adapting Projection-Based Reduced-Order Models using Projected Gaussian Process

Projection-based model reduction is among the most widely adopted methods for constructing parametric Reduced-Order Models (ROM). Utilizing the snapshot data from solving full-order governing equations, the Proper Orthogonal Decomposition…

机器学习 · 统计学 2025-09-16 Xiao Liu , Jingyi Feng , Xinchao Liu

C-Learner: Constrained Learning for Causal Inference

Popular debiased estimation methods for causal inference -- such as augmented inverse propensity weighting and targeted maximum likelihood estimation -- enjoy desirable asymptotic properties like statistical efficiency and double robustness…

机器学习 · 统计学 2025-09-16 Tiffany Tianhui Cai , Yuri Fonseca , Kaiwen Hou , Hongseok Namkoong

Generalized Dirichlet Energy and Graph Laplacians for Clustering Directed and Undirected Graphs

Clustering in directed graphs remains a fundamental challenge due to the asymmetry in edge connectivity, which limits the applicability of classical spectral methods originally designed for undirected graphs. A common workaround is to…

机器学习 · 统计学 2025-09-16 Harry Sevi , Gwendal Debaussart-Joniec , Malik Hacini , Matthieu Jonckheere , Argyris Kalogeratos

Differentially Private Decentralized Dataset Synthesis Through Randomized Mixing with Correlated Noise

In this work, we explore differentially private synthetic data generation in a decentralized-data setting by building on the recently proposed Differentially Private Class-Centric Data Aggregation (DP-CDA). DP-CDA synthesizes data in a…

机器学习 · 统计学 2025-09-15 Utsab Saha , Tanvir Muntakim Tonoy , Hafiz Imtiaz

An Information-Theoretic Framework for Credit Risk Modeling: Unifying Industry Practice with Statistical Theory for Fair and Interpretable Scorecards

Credit risk modeling relies extensively on Weight of Evidence (WoE) and Information Value (IV) for feature engineering, and Population Stability Index (PSI) for drift monitoring, yet their theoretical foundations remain disconnected. We…

机器学习 · 统计学 2025-09-15 Agus Sudjianto , Denis Burakov

Constructive Universal Approximation and Sure Convergence for Multi-Layer Neural Networks

We propose o1Neuro, a new neural network model built on sparse indicator activation neurons, with two key statistical properties. (1) Constructive universal approximation: At the population level, a deep o1Neuro can approximate any…

机器学习 · 统计学 2025-09-15 Chien-Ming Chi

Soft Diamond Regularizers for Deep Learning

This chapter presents the new family of soft diamond synaptic regularizers based on thick-tailed symmetric alpha stable $S{\alpha}S$ probability bell curves. These new parametrized weight priors improved deep-learning performance on image…

机器学习 · 统计学 2025-09-15 Olaoluwa Adigun , Bart Kosko

On Regression in Extreme Regions

We establish a statistical learning theoretical framework aimed at extrapolation, or out-of-domain generalization, on the unobserved tails of covariates in continuous regression problems. Our strategy involves performing statistical…

机器学习 · 统计学 2025-09-15 Stephan Clémençon , Nathan Huet , Anne Sabourin

Global Optimization of Stochastic Black-Box Functions with Arbitrary Noise Distributions using Wilson Score Kernel Density Estimation

Many optimization problems in robotics involve the optimization of time-expensive black-box functions, such as those involving complex simulations or evaluation of real-world experiments. Furthermore, these functions are often stochastic as…

机器学习 · 统计学 2025-09-12 Thorbjørn Mosekjær Iversen , Lars Carøe Sørensen , Simon Faarvang Mathiesen , Henrik Gordon Petersen

Asynchronous Gossip Algorithms for Rank-Based Statistical Methods

As decentralized AI and edge intelligence become increasingly prevalent, ensuring robustness and trustworthiness in such distributed settings has become a critical issue-especially in the presence of corrupted or adversarial data.…

机器学习 · 统计学 2025-09-12 Anna Van Elst , Igor Colin , Stephan Clémençon

Uniform convergence for Gaussian kernel ridge regression

This paper establishes the first polynomial convergence rates for Gaussian kernel ridge regression (KRR) with a fixed hyperparameter in both the uniform and the $L^{2}$-norm. The uniform convergence result closes a gap in the theoretical…

机器学习 · 统计学 2025-09-12 Paul Dommel , Rajmadan Lakshmanan

Euclidean Distance Deflation Under High-Dimensional Heteroskedastic Noise

Pairwise Euclidean distance calculation is a fundamental step in many machine learning and data analysis algorithms. In real-world applications, however, these distances are frequently distorted by heteroskedastic noise$\unicode{x2014}$a…

机器学习 · 统计学 2025-09-12 Keyi Li , Yuval Kluger , Boris Landa

A hierarchical entropy method for the delocalization of bias in high-dimensional Langevin Monte Carlo

The unadjusted Langevin algorithm is widely used for sampling from complex high-dimensional distributions. It is well known to be biased, with the bias typically scaling linearly with the dimension when measured in squared Wasserstein…

机器学习 · 统计学 2025-09-11 Daniel Lacker , Fuzhong Zhou

Uncertainty Quantification in Probabilistic Machine Learning Models: Theory, Methods, and Insights

Uncertainty Quantification (UQ) is essential in probabilistic machine learning models, particularly for assessing the reliability of predictions. In this paper, we present a systematic framework for estimating both epistemic and aleatoric…

机器学习 · 统计学 2025-09-11 Marzieh Ajirak , Anand Ravishankar , Petar M. Djuric