机器学习 — Scifaro

Bounds and Identification of Joint Probabilities of Potential Outcomes and Observed Variables under Monotonicity Assumptions

Evaluating joint probabilities of potential outcomes and observed variables, and their linear combinations, is a fundamental challenge in causal inference. This paper addresses the bounding and identification of these probabilities in…

机器学习 · 统计学 2026-02-24 Naoya Hashimoto , Yuta Kawakami , Jin Tian

Multiclass Calibration Assessment and Recalibration of Probability Predictions via the Linear Log Odds Calibration Function

Machine-generated probability predictions are essential in modern classification tasks such as image classification. A model is well calibrated when its predicted probabilities correspond to observed event frequencies. Despite the need for…

机器学习 · 统计学 2026-02-24 Amy Vennos , Xin Xing , Christopher T. Franck

Activation-Space Uncertainty Quantification for Pretrained Networks

Reliable uncertainty estimates are crucial for deploying pretrained models; yet, many strong methods for quantifying uncertainty require retraining, Monte Carlo sampling, or expensive second-order computations and may alter a frozen…

机器学习 · 统计学 2026-02-24 Richard Bergna , Stefan Depeweg , Sergio Calvo-Ordoñez , Jonathan Plenk , Alvaro Cartea , Jose Miguel Hernández-Lobato

Information-Theoretic Causal Bounds under Unmeasured Confounding

We develop a data-driven information-theoretic framework for sharp partial identification of causal effects under unmeasured confounding. Existing approaches often rely on restrictive assumptions, such as bounded or discrete outcomes;…

机器学习 · 统计学 2026-02-24 Yonghan Jung , Bogyeong Kang

Low-Dimensional Adaptation of Rectified Flow: A Diffusion and Stochastic Localization Perspective

In recent years, Rectified flow (RF) has gained considerable popularity largely due to its generation efficiency and state-of-the-art performance. In this paper, we investigate the degree to which RF automatically adapts to the intrinsic…

机器学习 · 统计学 2026-02-24 Saptarshi Roy , Alessandro Rinaldo , Purnamrita Sarkar

Constrained Density Estimation via Optimal Transport

A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the constraints that the expected value of a set of…

机器学习 · 统计学 2026-02-24 Yinan Hu , Esteban G. Tabak

Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing

Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending them to powerful nonparametric alternatives is…

机器学习 · 统计学 2026-02-24 Zhihan Huang , Ziang Niu

PBPK-iPINNs: Inverse Physics-Informed Neural Networks for Physiologically Based Pharmacokinetic Brain Models

Physics-Informed Neural Networks (PINNs) integrate machine learning with differential equations to solve forward and inverse problems while ensuring that predictions adhere to physical laws. Physiologically based pharmacokinetic (PBPK)…

机器学习 · 统计学 2026-02-24 Charuka D. Wickramasinghe , Krishanthi C. Weerasinghe , Pradeep K. Ranaweera , Nelum S. S. M. Hapuhinna

Predictable Compression Failures: Order Sensitivity and Information Budgeting for Evidence-Grounded Binary Adjudication

Transformers used for evidence-grounded question answering with binary adjudication (e.g., support/refute or yes/no) can be highly sensitive to the order in which exchangeable evidence is presented, producing dispersion across permutations…

机器学习 · 统计学 2026-02-24 Leon Chlon , Ahmed Karim , Maggie Chlon , MarcAntonio Awada

Non-Linear Model-Based Sequential Decision-Making in Agriculture

Sequential decision-making is central to sustainable agricultural management and precision agriculture, where resource inputs must be optimized under uncertainty and over time. However, such decisions must often be made with limited…

机器学习 · 统计学 2026-02-24 Sakshi Arya , Wentao Lin

Choosing the Better Bandit Algorithm under Data Sharing: When Do A/B Experiments Work?

We study A/B experiments that are designed to compare the performance of two recommendation algorithms. Prior work has observed that the stable unit treatment value assumption (SUTVA) often does not hold in large-scale recommendation…

机器学习 · 统计学 2026-02-24 Shuangning Li , Chonghuan Wang , Jingyan Wang

LLMs are Bayesian, In Expectation, Not in Realization

Exchangeability-based martingale diagnostics have been used to question Bayesian explanations of transformer in-context learning. We show that these violations are compatible with Bayesian/MDL behavior once we account for a basic…

机器学习 · 统计学 2026-02-24 Leon Chlon , Zein Khamis , Maggie Chlon , Mahdi El Zein , MarcAntonio M. Awada

Trustworthy Prediction with Gaussian Process Knowledge Scores

Probabilistic models are often used to make predictions in regions of the data space where no observations are available, but it is not always clear whether such predictions are well-informed by previously seen data. In this paper, we…

机器学习 · 统计学 2026-02-24 Kurt Butler , Guanchao Feng , Tong Chen , Petar Djuric

Probability Bounding: Post-Hoc Calibration via Box-Constrained Softmax

Many studies have observed that modern neural networks achieve high accuracy while producing poorly calibrated probabilities, making calibration a critical practical issue. In this work, we propose probability bounding (PB), a novel…

机器学习 · 统计学 2026-02-24 Kyohei Atarashi , Satoshi Oyama , Hiromi Arai , Hisashi Kashima

Feature Representation Transferring to Lightweight Models via Perception Coherence

In this paper, we propose a method for transferring feature representation to lightweight student models from larger teacher models. We mathematically define a new notion called \textit{perception coherence}. Based on this notion, we…

机器学习 · 统计学 2026-02-24 Hai-Vy Nguyen , Fabrice Gamboa , Sixin Zhang , Reda Chhaibi , Serge Gratton , Thierry Giaccone

Optimizing High-Dimensional Oblique Splits

Evidence suggests that oblique splits can significantly enhance the performance of decision trees. This paper explores the optimization of high-dimensional oblique splits for decision tree construction, establishing the Sufficient Impurity…

机器学习 · 统计学 2026-02-24 Chien-Ming Chi

The MAPS Algorithm: Fast model-agnostic and distribution-free prediction intervals for supervised learning

A fundamental problem in modern supervised learning is computing reliable conditional prediction intervals in high-dimensional settings: existing methods often rely on restrictive modelling assumptions, do not scale as predictor dimension…

机器学习 · 统计学 2026-02-24 Daniel Salnikov , Dan Leonte , Kevin Michalewicz

Model Selection and Parameter Estimation of One-Dimensional Gaussian Mixture Models

In this paper, we study the problem of learning one-dimensional Gaussian mixture models (GMMs) with a specific focus on estimating both the model order and the mixing distribution from independent and identically distributed (i.i.d.)…

机器学习 · 统计学 2026-02-24 Xinyu Liu , Hai Zhang

Stochastic Localization via Iterative Posterior Sampling

Building upon score-based learning, new interest in stochastic localization techniques has recently emerged. In these models, one seeks to noise a sample from the data distribution through a stochastic process, called observation process,…

机器学习 · 统计学 2026-02-24 Louis Grenioux , Maxence Noble , Marylou Gabrié , Alain Oliviero Durmus

Exploring Singularities in point clouds with the graph Laplacian: An explicit approach

We develop theory and methods that use the graph Laplacian to analyze the geometry of the underlying manifold of datasets. Our theory provides theoretical guarantees and explicit bounds on the functional forms of the graph Laplacian when it…

机器学习 · 统计学 2026-02-24 Martin Andersson , Benny Avelin