机器学习 — Scifaro

Fractal Flow: Hierarchical and Interpretable Normalizing Flow via Topic Modeling and Recursive Strategy

Normalizing Flows provide a principled framework for high-dimensional density estimation and generative modeling by constructing invertible transformations with tractable Jacobian determinants. We propose Fractal Flow, a novel normalizing…

机器学习 · 统计学 2025-08-28 Binhui Zhang , Jianwei Ma

CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference

Current experimental scientists have been increasingly relying on simulation-based inference (SBI) to invert complex non-linear models with intractable likelihoods. However, posterior approximations obtained with SBI are often…

机器学习 · 统计学 2025-08-28 Luben M. C. Cabezas , Vagner S. Santos , Thiago R. Ramos , Pedro L. C. Rodrigues , Rafael Izbicki

Predicting Forced Responses of Probability Distributions via the Fluctuation-Dissipation Theorem and Generative Modeling

We present a novel and flexible data-driven framework for estimating the response of higher-order moments of nonlinear stochastic systems to small external perturbations. The classical Generalized Fluctuation--Dissipation Theorem (GFDT)…

机器学习 · 统计学 2025-08-28 Ludovico T. Giorgini , Fabrizio Falasca , Andre N. Souza

Bayes-Optimal Fair Classification with Linear Disparity Constraints via Pre-, In-, and Post-processing

Machine learning algorithms may have disparate impacts on protected groups. To address this, we develop methods for Bayes-optimal fair classification, aiming to minimize classification error subject to given group fairness constraints. We…

机器学习 · 统计学 2025-08-28 Xianli Zeng , Kevin Jiang , Guang Cheng , Edgar Dobriban

Deep Learning of Semi-Competing Risk Data via a New Neural Expectation-Maximization Algorithm

Prognostication for lung cancer, a leading cause of mortality, remains a complex task, as it needs to quantify the associations of risk factors and health events spanning a patient's entire life. One challenge is that an individual's…

机器学习 · 统计学 2025-08-28 Stephen Salerno , Yi Li

Sparse minimum Redundancy Maximum Relevance for feature selection

We propose a feature screening method that integrates both feature-feature and feature-target relationships. Inactive features are identified via a penalized minimum Redundancy Maximum Relevance (mRMR) procedure, which is the continuous…

机器学习 · 统计学 2025-08-27 Peter Naylor , Benjamin Poignard , Héctor Climente-González , Makoto Yamada

Revisiting Follow-the-Perturbed-Leader with Unbounded Perturbations in Bandit Problems

Follow-the-Regularized-Leader (FTRL) policies have achieved Best-of-Both-Worlds (BOBW) results in various settings through hybrid regularizers, whereas analogous results for Follow-the-Perturbed-Leader (FTPL) remain limited due to inherent…

机器学习 · 统计学 2025-08-27 Jongyeong Lee , Junya Honda , Shinji Ito , Min-hwan Oh

Deterministic Coreset Construction via Adaptive Sensitivity Trimming

We develop a rigorous framework for deterministic coreset construction in empirical risk minimization (ERM). Our central contribution is the Adaptive Deterministic Uniform-Weight Trimming (ADUWT) algorithm, which constructs a coreset by…

机器学习 · 统计学 2025-08-27 Faruk Alpay , Taylan Alpay

Learning the Simplest Neural ODE

Since the advent of the ``Neural Ordinary Differential Equation (Neural ODE)'' paper, learning ODEs with deep learning has been applied to system identification, time-series forecasting, and related areas. Exploiting the diffeomorphic…

机器学习 · 统计学 2025-08-27 Yuji Okamoto , Tomoya Takeuchi , Yusuke Sakemi

Clinical characteristics, complications and outcomes of critically ill patients with Dengue in Brazil, 2012-2024: a nationwide, multicentre cohort study

Background. Dengue outbreaks are a major public health issue, with Brazil reporting 71% of global cases in 2024. Purpose. This study aims to describe the profile of severe dengue patients admitted to Brazilian Intensive Care units (ICUs)…

机器学习 · 统计学 2025-08-26 Igor Tona Peres , Otavio T. Ranzani , Leonardo S. L. Bastos , Silvio Hamacher , Tom Edinburgh , Esteban Garcia-Gallo , Fernando Augusto Bozza

Algebraic Approach to Ridge-Regularized Mean Squared Error Minimization in Minimal ReLU Neural Network

This paper investigates a perceptron, a simple neural network model, with ReLU activation and a ridge-regularized mean squared error (RR-MSE). Our approach leverages the fact that the RR-MSE for ReLU perceptron is piecewise polynomial,…

机器学习 · 统计学 2025-08-26 Ryoya Fukasaku , Yutaro Kabata , Akifumi Okuno

High-Order Langevin Monte Carlo Algorithms

Langevin algorithms are popular Markov chain Monte Carlo (MCMC) methods for large-scale sampling problems that often arise in data science. We propose Monte Carlo algorithms based on the discretizations of $P$-th order Langevin dynamics for…

机器学习 · 统计学 2025-08-26 Thanh Dang , Mert Gurbuzbalaban , Mohammad Rafiqul Islam , Nian Yao , Lingjiong Zhu

On the sample complexity of semi-supervised multi-objective learning

In multi-objective learning (MOL), several possibly competing prediction tasks must be solved jointly by a single model. Achieving good trade-offs may require a model class $\mathcal{G}$ with larger capacity than what is necessary for…

机器学习 · 统计学 2025-08-26 Tobias Wegel , Geelon So , Junhyung Park , Fanny Yang

Factor Informed Double Deep Learning For Average Treatment Effect Estimation

We investigate the problem of estimating the average treatment effect (ATE) under a very general setup where the covariates can be high-dimensional, highly correlated, and can have sparse nonlinear effects on the propensity and outcome…

机器学习 · 统计学 2025-08-26 Jianqing Fan , Soham Jana , Sanjeev Kulkarni , Qishuo Yin

Rao Differential Privacy

Differential privacy (DP) has recently emerged as a definition of privacy to release private estimates. DP calibrates noise to be on the order of an individuals contribution. Due to the this calibration a private estimate obscures any…

机器学习 · 统计学 2025-08-26 Carlos Soto

Limitations of refinement methods for weak to strong generalization

Standard techniques for aligning large language models (LLMs) utilize human-produced data, which could limit the capability of any aligned LLM to human level. Label refinement and weak training have emerged as promising strategies to…

机器学习 · 统计学 2025-08-26 Seamus Somerstep , Ya'acov Ritov , Mikhail Yurochkin , Subha Maity , Yuekai Sun

GraphPPD: Posterior Predictive Modelling for Graph-Level Inference

Accurate modelling and quantification of predictive uncertainty is crucial in deep learning since it allows a model to make safer decisions when the data is ambiguous and facilitates the users' understanding of the model's confidence in its…

机器学习 · 统计学 2025-08-26 Soumyasundar Pal , Liheng Ma , Amine Natik , Yingxue Zhang , Mark Coates

MOCA-HESP: Meta High-dimensional Bayesian Optimization for Combinatorial and Mixed Spaces via Hyper-ellipsoid Partitioning

High-dimensional Bayesian Optimization (BO) has attracted significant attention in recent research. However, existing methods have mainly focused on optimizing in continuous domains, while combinatorial (ordinal and categorical) and mixed…

机器学习 · 统计学 2025-08-26 Lam Ngo , Huong Ha , Jeffrey Chan , Hongyu Zhang

How do Probabilistic Graphical Models and Graph Neural Networks Look at Network Data?

Graphs are a powerful data structure for representing relational data and are widely used to describe complex real-world systems. Probabilistic Graphical Models (PGMs) and Graph Neural Networks (GNNs) can both leverage graph-structured…

机器学习 · 统计学 2025-08-26 Michela Lapenna , Caterina De Bacco

Deep spatio-temporal point processes: Advances and new directions

Spatio-temporal point processes (STPPs) model discrete events distributed in time and space, with important applications in areas such as criminology, seismology, epidemiology, and social networks. Traditional models often rely on…

机器学习 · 统计学 2025-08-26 Xiuyuan Cheng , Zheng Dong , Yao Xie