机器学习
Statistical inference from data generated by multi-armed bandit (MAB) algorithms is challenging due to their adaptive, non-i.i.d. nature. A classical manifestation is that sample averages of arm rewards under bandit sampling may fail to…
Scaling laws describe how learning performance improves with data, compute, or training time, and have become a central theme in modern deep learning. We study this phenomenon in a canonical nonlinear model: phase retrieval with anisotropic…
We propose a way of transforming the problem of conditional density estimation into a single nonparametric regression task via the introduction of auxiliary samples. This allows leveraging regression methods that work well in high…
We study the problem of selecting the best heterogeneous treatment effect (HTE) estimator from a collection of candidates in settings where the treatment effect is fundamentally unobserved. We cast estimator selection as a multiple testing…
Ecological Momentary Assessment provides real-time data on suicidal thoughts and behaviors, but predicting suicide attempts remains challenging due to their rarity and patient heterogeneity. We show that single models fit to all patients…
We propose and analyze a variant of Sparse Polyak for high dimensional M-estimation problems. Sparse Polyak proposes a novel adaptive step-size rule tailored to suitably estimate the problem's curvature in the high-dimensional setting,…
Node popularity is recognized as a key factor in modeling real-world networks, capturing heterogeneity in connectivity across communities. This concept is equally important in bipartite networks, where nodes in different partitions may…
Data assimilation is a fundamental task in updating forecasting models upon observing new data, with applications ranging from weather prediction to online reinforcement learning. Deep generative forecasting models (DGFMs) have shown…
This study proposes a Quantum Fourier Transform (QFT)-enhanced quantum kernel for short-term time-series forecasting. Each signal is windowed, amplitude-encoded, transformed by a QFT, then passed through a protective rotation layer to avoid…
We study the excess risk evaluation of classical penalized empirical risk minimization (ERM) with Bregman losses. We show that by leveraging the idea of wild refitting, one can efficiently upper bound the excess risk through the so-called…
This paper studies the problem of dimension reduction, tailored to improving time series forecasting with high-dimensional predictors. We propose a novel Supervised Deep Dynamic Principal component analysis (SDDP) framework that…
The coupled entropy is proven to correct a flaw in the derivation of the Tsallis entropy and thereby solidify the theoretical foundations for analyzing the uncertainty of complex systems. The Tsallis entropy originated from considering…
The softmax-contaminated mixture of experts (MoE) model is deployed when a large-scale pre-trained model, which plays the role of a fixed expert, is fine-tuned for learning downstream tasks by including a new contamination part, or prompt,…
We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to target distribution with only target samples, either…
The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either…
High spatial resolution wind data are essential for a wide range of applications in climate, oceanographic and meteorological studies. Large-scale spatial interpolation or downscaling of bivariate wind fields having velocity in two…
We consider the optimization of a smooth and strongly convex objective using constant step-size stochastic gradient descent (SGD) and study its properties through the prism of Markov chains. We show that, for unbiased gradient estimates…
Score-based divergences have been widely used in machine learning and statistics applications. Despite their empirical success, a blindness problem has been observed when using these for multi-modal distributions. In this work, we discuss…
In this study, a scalable online kernel learning framework is proposed for estimating bidirectional causal effects in systems characterized by mutual dependence and heteroskedasticity. Traditional causal inference often focuses on…
In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s \in [0,\infty]$) and the compositional assumption. This assumption requires the conditional class probability…