机器学习 — Scifaro

When few labeled target data suffice: a theory of semi-supervised domain adaptation via fine-tuning from multiple adaptive starts

Semi-supervised domain adaptation (SSDA) aims to achieve high predictive performance in the target domain with limited labeled target data by exploiting abundant source and unlabeled target data. Despite its significance in numerous…

机器学习 · 统计学 2025-07-22 Wooseok Ha , Yuansi Chen

Deep Learning-Based Survival Analysis with Copula-Based Activation Functions for Multivariate Response Prediction

This research integrates deep learning, copula functions, and survival analysis to effectively handle highly correlated and right-censored multivariate survival data. It introduces copula-based activation functions (Clayton, Gumbel, and…

机器学习 · 统计学 2025-07-22 Jong-Min Kim , Il Do Ha , Sangjin Kim

Statistical and Algorithmic Foundations of Reinforcement Learning

As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of…

机器学习 · 统计学 2025-07-22 Yuejie Chi , Yuxin Chen , Yuting Wei

Optimal Task Order for Continual Learning of Multiple Tasks

Continual learning of multiple tasks remains a major challenge for neural networks. Here, we investigate how task order influences continual learning and propose a strategy for optimizing it. Leveraging a linear teacher-student model with…

机器学习 · 统计学 2025-07-22 Ziyan Li , Naoki Hiratani

Grokking at the Edge of Linear Separability

We investigate the phenomenon of grokking -- delayed generalization accompanied by non-monotonic test loss behavior -- in a simple binary logistic classification task, for which "memorizing" and "generalizing" solutions can be strictly…

机器学习 · 统计学 2025-07-22 Alon Beck , Noam Levi , Yohai Bar-Sinai

Statistical learning for constrained functional parameters in infinite-dimensional models

We develop a general framework for estimating function-valued parameters under equality or inequality constraints in infinite-dimensional statistical models. Such constrained learning problems are common across many areas of statistics and…

机器学习 · 统计学 2025-07-22 Razieh Nabi , Nima S. Hejazi , Mark J. van der Laan , David Benkeser

High-dimensional Asymptotics of VAEs: Threshold of Posterior Collapse and Dataset-Size Dependence of Rate-Distortion Curve

In variational autoencoders (VAEs), the variational posterior often collapses to the prior, known as posterior collapse, which leads to poor representation learning quality. An adjustable hyperparameter beta has been introduced in VAEs to…

机器学习 · 统计学 2025-07-22 Yuma Ichikawa , Koji Hukushima

Conformalized Regression for Continuous Bounded Outcomes

Regression problems with bounded continuous outcomes frequently arise in real-world statistical and machine learning applications, such as the analysis of rates and proportions. A central challenge in this setting is predicting a response…

机器学习 · 统计学 2025-07-21 Zhanli Wu , Fabrizio Leisen , F. Javier Rubio

A Survey of Dimension Estimation Methods

It is a standard assumption that datasets in high dimension have an internal structure which means that they in fact lie on, or near, subsets of a lower dimension. In many instances it is important to understand the real dimension of the…

机器学习 · 统计学 2025-07-21 James A. D. Binnie , Paweł Dłotko , John Harvey , Jakub Malinowski , Ka Man Yim

Conformal Data Contamination Tests for Trading or Sharing of Data

The amount of quality data in many machine learning tasks is limited to what is available locally to data owners. The set of quality data can be expanded through trading or sharing with external data agents. However, data buyers need…

机器学习 · 统计学 2025-07-21 Martin V. Vejling , Shashi Raj Pandey , Christophe A. N. Biscio , Petar Popovski

Differential Privacy in Kernelized Contextual Bandits via Random Projections

We consider the problem of contextual kernel bandits with stochastic contexts, where the underlying reward function belongs to a known Reproducing Kernel Hilbert Space. We study this problem under an additional constraint of Differential…

机器学习 · 统计学 2025-07-21 Nikola Pavlovic , Sudeep Salgia , Qing Zhao

On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks

This paper studies the problem of how efficiently functions in the Sobolev spaces $\mathcal{W}^{s,q}([0,1]^d)$ and Besov spaces $\mathcal{B}^s_{q,r}([0,1]^d)$ can be approximated by deep ReLU neural networks with width $W$ and depth $L$,…

机器学习 · 统计学 2025-07-21 Yunfei Yang

Statistical and Computational Guarantees of Kernel Max-Sliced Wasserstein Distances

Optimal transport has been very successful for various machine learning tasks; however, it is known to suffer from the curse of dimensionality. Hence, dimensionality reduction is desirable when applied to high-dimensional data with…

机器学习 · 统计学 2025-07-21 Jie Wang , March Boedihardjo , Yao Xie

Universal Scaling Laws of Absorbing Phase Transitions in Artificial Deep Neural Networks

We demonstrate that conventional artificial deep neural networks operating near the phase boundary of the signal propagation dynamics, also known as the edge of chaos, exhibit universal scaling laws of absorbing phase transitions in…

机器学习 · 统计学 2025-07-21 Keiichi Tamai , Tsuyoshi Okubo , Truong Vinh Truong Duy , Naotake Natori , Synge Todo

Relation-Aware Slicing in Cross-Domain Alignment

The Sliced Gromov-Wasserstein (SGW) distance, aiming to relieve the computational cost of solving a non-convex quadratic program that is the Gromov-Wasserstein distance, utilizes projecting directions sampled uniformly from unit…

机器学习 · 统计学 2025-07-18 Dhruv Sarkar , Aprameyo Chakrabartty , Anish Chakrabarty , Swagatam Das

Self Balancing Neural Network: A Novel Method to Estimate Average Treatment Effect

In observational studies, confounding variables affect both treatment and outcome. Moreover, instrumental variables also influence the treatment assignment mechanism. This situation sets the study apart from a standard randomized controlled…

机器学习 · 统计学 2025-07-18 Atomsa Gemechu Abdisa , Yingchun Zhou , Yuqi Qiu

Physics constrained learning of stochastic characteristics

Accurate state estimation requires careful consideration of uncertainty surrounding the process and measurement models; these characteristics are usually not well-known and need an experienced designer to select the covariance matrices. An…

机器学习 · 统计学 2025-07-18 Pardha Sai Krishna Ala , Ameya Salvi , Venkat Krovi , Matthias Schmid

How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction

In recent years, contrastive learning has achieved state-of-the-art performance in the territory of self-supervised representation learning. Many previous works have attempted to provide the theoretical understanding underlying the success…

机器学习 · 统计学 2025-07-18 Jun Chen , Hong Chen , Yonghua Yu , Yiming Ying

Nonparametric IPSS: Fast, flexible feature selection with false discovery control

Feature selection is a critical task in machine learning and statistics. However, existing feature selection methods either (i) rely on parametric methods such as linear or generalized linear models, (ii) lack theoretical false discovery…

机器学习 · 统计学 2025-07-18 Omar Melikechi , David B. Dunson , Jeffrey W. Miller

Bounding the Worst-class Error: A Boosting Approach

This paper tackles the problem of the worst-class error rate, instead of the standard error rate averaged over all classes. For example, a three-class classification task with class-wise error rates of 10%, 10%, and 40% has a worst-class…

机器学习 · 统计学 2025-07-18 Yuya Saito , Shinnosuke Matsuo , Seiichi Uchida , Daiki Suehiro