机器学习 — Scifaro

Physics-informed features in supervised machine learning

Supervised machine learning involves approximating an unknown functional relationship from a limited dataset of features and corresponding labels. The classical approach to feature-based machine learning typically relies on applying linear…

机器学习 · 统计学 2025-04-25 Margherita Lampani , Sabrina Guastavino , Michele Piana , Federico Benvenuto

Sparse Gaussian Neural Processes

Despite significant recent advances in probabilistic meta-learning, it is common for practitioners to avoid using deep learning models due to a comparative lack of interpretability. Instead, many practitioners simply use non-meta-models…

机器学习 · 统计学 2025-04-25 Tommy Rochussen , Vincent Fortuin

Uncertainty Quantification With Noise Injection in Neural Networks: A Bayesian Perspective

Model uncertainty quantification involves measuring and evaluating the uncertainty linked to a model's predictions, helping assess their reliability and confidence. Noise injection is a technique used to enhance the robustness of neural…

机器学习 · 统计学 2025-04-25 Xueqiong Yuan , Jipeng Li , Ercan Engin Kuruoglu

Linear Convergence of Diffusion Models Under the Manifold Hypothesis

Score-matching generative models have proven successful at sampling from complex high-dimensional data distributions. In many applications, this distribution is believed to concentrate on a much lower $d$-dimensional manifold embedded into…

机器学习 · 统计学 2025-04-25 Peter Potaptchik , Iskander Azangulov , George Deligiannidis

Convergence of Diffusion Models Under the Manifold Hypothesis in High-Dimensions

Denoising Diffusion Probabilistic Models (DDPM) are powerful state-of-the-art methods used to generate synthetic data from high-dimensional data distributions and are widely used for image, audio, and video generation as well as many more…

机器学习 · 统计学 2025-04-25 Iskander Azangulov , George Deligiannidis , Judith Rousseau

Variation Due to Regularization Tractably Recovers Bayesian Deep Learning

Uncertainty quantification in deep learning is crucial for safe and reliable decision-making in downstream tasks. Existing methods quantify uncertainty at the last layer or other approximations of the network which may miss some sources of…

机器学习 · 统计学 2025-04-25 James McInerney , Nathan Kallus

Efficient Neural Network Approaches for Conditional Optimal Transport with Applications in Bayesian Inference

We present two neural network approaches that approximate the solutions of static and dynamic…

机器学习 · 统计学 2025-04-25 Zheyu Oliver Wang , Ricardo Baptista , Youssef Marzouk , Lars Ruthotto , Deepanshu Verma

Covariate-dependent Graphical Model Estimation via Neural Networks with Statistical Guarantees

Graphical models are widely used in diverse application domains to model the conditional dependencies amongst a collection of random variables. In this paper, we consider settings where the graph structure is covariate-dependent, and…

机器学习 · 统计学 2025-04-24 Jiahe Lin , Yikai Zhang , George Michailidis

Dual NUP Representations and Min-Maximization in Factor Graphs

Normals with unknown parameters (NUP) can be used to convert nontrivial model-based estimation problems into iterations of linear least-squares or Gaussian estimation problems. In this paper, we extend this approach by augmenting factor…

机器学习 · 统计学 2025-04-24 Yun-Peng Li , Hans-Andrea Loeliger

Explainable Unsupervised Anomaly Detection with Random Forest

We describe the use of an unsupervised Random Forest for similarity learning and improved unsupervised anomaly detection. By training a Random Forest to discriminate between real data and synthetic data sampled from a uniform distribution…

机器学习 · 统计学 2025-04-23 Joshua S. Harvey , Joshua Rosaler , Mingshu Li , Dhruv Desai , Dhagash Mehta

How Private is Your Attention? Bridging Privacy with In-Context Learning

In-context learning (ICL)-the ability of transformer-based models to perform new tasks from examples provided at inference time-has emerged as a hallmark of modern language models. While recent works have investigated the mechanisms…

机器学习 · 统计学 2025-04-23 Soham Bonnerjee , Zhen Wei , Yeon , Anna Asch , Sagnik Nandy , Promit Ghosal

From predictions to confidence intervals: an empirical study of conformal prediction methods for in-context learning

Transformers have become a standard architecture in machine learning, demonstrating strong in-context learning (ICL) abilities that allow them to learn from the prompt at inference time. However, uncertainty quantification for ICL remains…

机器学习 · 统计学 2025-04-23 Zhe Huang , Simone Rossi , Rui Yuan , Thomas Hannagan

Transfer Learning for High-dimensional Reduced Rank Time Series Models

The objective of transfer learning is to enhance estimation and inference in a target data by leveraging knowledge gained from additional sources. Recent studies have explored transfer learning for independent observations in complex,…

机器学习 · 统计学 2025-04-23 Mingliang Ma Abolfazl Safikhani

When resampling/reweighting improves feature learning in imbalanced classification?: A toy-model study

A toy model of binary classification is studied with the aim of clarifying the class-wise resampling/reweighting effect on the feature learning performance under the presence of class imbalance. In the analysis, a high-dimensional limit of…

机器学习 · 统计学 2025-04-23 Tomoyuki Obuchi , Toshiyuki Tanaka

Benign overfitting in Fixed Dimension via Physics-Informed Learning with Smooth Inductive Bias

Recent advances in machine learning have inspired a surge of research into reconstructing specific quantities of interest from measurements that comply with certain physical laws. These efforts focus on inverse problems that are governed by…

机器学习 · 统计学 2025-04-23 Honam Wong , Wendao Wu , Fanghui Liu , Yiping Lu

Network Distance Based on Laplacian Flows on Graphs

Distance plays a fundamental role in measuring similarity between objects. Various visualization techniques and learning tasks in statistics and machine learning such as shape matching, classification, dimension reduction and clustering…

机器学习 · 统计学 2025-04-23 Dianbin Bao , Kisung You , Lizhen Lin

Advanced posterior analyses of hidden Markov models: finite Markov chain imbedding and hybrid decoding

Two major tasks in applications of hidden Markov models are to (i) compute distributions of summary statistics of the hidden state sequence, and (ii) decode the hidden state sequence. We describe finite Markov chain imbedding (FMCI) and…

机器学习 · 统计学 2025-04-22 Zenia Elise Damgaard Bæk , Moisès Coll Macià , Laurits Skov , Asger Hobolth

Learning over von Mises-Fisher Distributions via a Wasserstein-like Geometry

We introduce a novel, geometry-aware distance metric for the family of von Mises-Fisher (vMF) distributions, which are fundamental models for directional data on the unit hypersphere. Although the vMF distribution is widely employed in a…

机器学习 · 统计学 2025-04-22 Kisung You , Dennis Shung , Mauro Giuffrè

Toward Sufficient Statistical Power in Algorithmic Bias Assessment: A Test for ABROCA

Algorithmic bias is a pressing concern in educational data mining (EDM), as it risks amplifying inequities in learning outcomes. The Area Between ROC Curves (ABROCA) metric is frequently used to measure discrepancies in model performance…

机器学习 · 统计学 2025-04-22 Conrad Borchers

4+3 Phases of Compute-Optimal Neural Scaling Laws

We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling…

机器学习 · 统计学 2025-04-22 Elliot Paquette , Courtney Paquette , Lechao Xiao , Jeffrey Pennington