机器学习 — Scifaro

A Novel Algorithm for Personalized Federated Learning: Knowledge Distillation with Weighted Combination Loss

Federated learning (FL) offers a privacy-preserving framework for distributed machine learning, enabling collaborative model training across diverse clients without centralizing sensitive data. However, statistical heterogeneity,…

机器学习 · 统计学 2025-04-08 Hengrui Hu , Anai N. Kothari , Anjishnu Banerjee

Scalable Approximate Algorithms for Optimal Transport Linear Models

Recently, linear regression models incorporating an optimal transport (OT) loss have been explored for applications such as supervised unmixing of spectra, music transcription, and mass spectrometry. However, these task-specific approaches…

机器学习 · 统计学 2025-04-08 Tomasz Kacprzak , Francois Kamper , Michael W. Heiss , Gianluca Janka , Ann M. Dillner , Satoshi Takahama

Cramer-Rao Bounds for Laplacian Matrix Estimation

In this paper, we analyze the performance of the estimation of Laplacian matrices under general observation models. Laplacian matrix estimation involves structural constraints, including symmetry and null-space properties, along with matrix…

机器学习 · 统计学 2025-04-08 Morad Halihal , Tirza Routtenberg , H. Vincent Poor

Spatially-Heterogeneous Causal Bayesian Networks for Seismic Multi-Hazard Estimation: A Variational Approach with Gaussian Processes and Normalizing Flows

Post-earthquake hazard and impact estimation are critical for effective disaster response, yet current approaches face significant limitations. Traditional models employ fixed parameters regardless of geographical context, misrepresenting…

机器学习 · 统计学 2025-04-08 Xuechun Li , Shan Gao , Runyu Gao , Susu Xu

High-Dimensional Differential Parameter Inference in Exponential Family using Time Score Matching

This paper addresses differential inference in time-varying parametric probabilistic models, like graphical models with changing structures. Instead of estimating a high-dimensional model at each time point and estimating changes later, we…

机器学习 · 统计学 2025-04-08 Daniel J. Williams , Leyang Wang , Qizhen Ying , Song Liu , Mladen Kolar

Differentially Private Sliced Inverse Regression: Minimax Optimality and Algorithm

Privacy preservation has become a critical concern in high-dimensional data analysis due to the growing prevalence of data-driven applications. Since its proposal, sliced inverse regression has emerged as a widely utilized statistical…

机器学习 · 统计学 2025-04-08 Xintao Xia , Linjun Zhang , Zhanrui Cai

Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality

Stochastic Gradient Descent (SGD) is one of the most popular algorithms in statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in…

机器学习 · 统计学 2025-04-08 Ziyang Wei , Wanrong Zhu , Wei Biao Wu

Predicting Census Survey Response Rates With Parsimonious Additive Models and Structured Interactions

In this paper, we consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models. The study is motivated by the US Census Bureau's well-known ROAM application, which uses a linear…

机器学习 · 统计学 2025-04-08 Shibal Ibrahim , Peter Radchenko , Emanuel Ben-David , Rahul Mazumder

Operator Learning: A Statistical Perspective

Operator learning has emerged as a powerful tool in scientific computing for approximating mappings between infinite-dimensional function spaces. A primary application of operator learning is the development of surrogate models for the…

机器学习 · 统计学 2025-04-07 Unique Subedi , Ambuj Tewari

Block Toeplitz Sparse Precision Matrix Estimation for Large-Scale Interval-Valued Time Series Forecasting

Modeling and forecasting interval-valued time series (ITS) have attracted considerable attention due to their growing presence in various contexts. To the best of our knowledge, there have been no efforts to model large-scale ITS. In this…

机器学习 · 统计学 2025-04-07 Wan Tian , Zhongfeng Qin

Adaptive Classification of Interval-Valued Time Series

In recent years, the modeling and analysis of interval-valued time series have garnered significant attention in the fields of econometrics and statistics. However, the existing literature primarily focuses on regression tasks while…

机器学习 · 统计学 2025-04-07 Wan Tian , Zhongfeng Qin

ConfEviSurrogate: A Conformalized Evidential Surrogate Model for Uncertainty Quantification

Surrogate models, crucial for approximating complex simulation data across sciences, inherently carry uncertainties that range from simulation noise to model prediction errors. Without rigorous uncertainty quantification, predictions become…

机器学习 · 统计学 2025-04-07 Yuhan Duan , Xin Zhao , Neng Shi , Han-Wei Shen

Data-Efficient Kernel Methods for Learning Differential Equations and Their Solution Operators: Algorithms and Error Analysis

We introduce a novel kernel-based framework for learning differential equations and their solution maps that is efficient in data requirements, in terms of solution examples and amount of measurements from each example, and computational…

机器学习 · 统计学 2025-04-07 Yasamin Jalalian , Juan Felipe Osorio Ramirez , Alexander Hsu , Bamdad Hosseini , Houman Owhadi

Quantifying Knowledge Distillation Using Partial Information Decomposition

Knowledge distillation deploys complex machine learning models in resource-constrained environments by training a smaller student model to emulate internal representations of a complex teacher model. However, the teacher's representations…

机器学习 · 统计学 2025-04-07 Pasan Dissanayake , Faisal Hamman , Barproda Halder , Ilia Sucholutsky , Qiuyi Zhang , Sanghamitra Dutta

How Feature Learning Can Improve Neural Scaling Laws

We develop a solvable model of neural scaling laws beyond the kernel limit. Theoretical analysis of this model shows how performance scales with model size, training time, and the total amount of available data. We identify three scaling…

机器学习 · 统计学 2025-04-07 Blake Bordelon , Alexander Atanasov , Cengiz Pehlevan

The Central Role of the Loss Function in Reinforcement Learning

This paper illustrates the central role of loss functions in data-driven decision making, providing a comprehensive survey on their influence in cost-sensitive classification (CSC) and reinforcement learning (RL). We demonstrate how…

机器学习 · 统计学 2025-04-07 Kaiwen Wang , Nathan Kallus , Wen Sun

A Structure-Preserving Kernel Method for Learning Hamiltonian Systems

A structure-preserving kernel ridge regression method is presented that allows the recovery of nonlinear Hamiltonian functions out of datasets made of noisy observations of Hamiltonian vector fields. The method proposes a closed-form…

机器学习 · 统计学 2025-04-07 Jianyu Hu , Juan-Pablo Ortega , Daiying Yin

Structured Matrix Learning under Arbitrary Entrywise Dependence and Estimation of Markov Transition Kernel

The problem of structured matrix estimation has been studied mostly under strong noise dependence assumptions. This paper considers a general framework of noisy low-rank-plus-sparse matrix recovery, where the noise matrix may come from any…

机器学习 · 统计学 2025-04-07 Jinhang Chai , Jianqing Fan

Analytical Discovery of Manifold with Machine Learning

Understanding low-dimensional structures within high-dimensional data is crucial for visualization, interpretation, and denoising in complex datasets. Despite the advancements in manifold learning techniques, key challenges-such as limited…

机器学习 · 统计学 2025-04-04 Yafei Shen , Huan-Fei Ma , Ling Yang

Dynamic Assortment Selection and Pricing with Censored Preference Feedback

In this study, we investigate the problem of dynamic multi-product selection and pricing by introducing a novel framework based on a \textit{censored multinomial logit} (C-MNL) choice model. In this model, sellers present a set of products…

机器学习 · 统计学 2025-04-04 Jung-hun Kim , Min-hwan Oh