机器学习
Modern machine learning models with a large number of parameters often generalize well despite perfectly interpolating noisy training data - a phenomenon known as benign overfitting. A foundational explanation for this in linear…
In this paper, we study the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The purpose of distributional TD learning is to estimate the return distribution of a…
High-dimensional data often contain low-dimensional signals obscured by structured background noise, which limits the effectiveness of standard PCA. Motivated by contrastive learning, we address the problem of recovering shared signal…
Modern epidemiological analytics increasingly use machine learning models that offer strong prediction but often lack calibrated uncertainty. Bayesian methods provide principled uncertainty quantification, yet are viewed as difficult to…
We study the high-dimensional inference of a rank-one signal corrupted by sparse noise. The noise is modelled as the adjacency matrix of a weighted undirected graph with finite average connectivity in the large size limit. Using the replica…
Estimating the probabilistic Worst-Case Execution Time (pWCET) is essential for ensuring the timing correctness of real-time applications, such as in robot IoT systems and autonomous driving systems. While methods based on Extreme Value…
Convolutional Neural Networks (CNNs) have achieved remarkable success across a wide range of machine learning tasks by leveraging hierarchical feature learning through deep architectures. However, the large number of layers and millions of…
Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL)…
Next Generation Reservoir Computing (NGRC) is a low-cost machine learning method for forecasting chaotic time series from data. Computational efficiency is crucial for scalable reservoir computing, requiring better strategies to reduce…
Existing theoretical work on Bayes-optimal fair classifiers usually considers a single (binary) sensitive feature. In practice, individuals are often defined by multiple sensitive features. In this paper, we characterize the Bayes-optimal…
Approximation and learning of classifiers of large data sets by neural networks in terms of high-dimensional geometry and statistical learning theory are investigated. The influence of the VC dimension of sets of input-output functions of…
We consider an on-line least squares regression problem with optimal solution $\theta^*$ and Hessian matrix H, and study a time-average stochastic gradient descent estimator of $\theta^*$. For $k\ge2$, we provide an unbiased estimator of…
We study data-driven learning of robust stochastic control for infinite-horizon systems with potentially continuous state and action spaces. In many managerial settings--supply chains, finance, manufacturing, services, and dynamic…
Despite remarkable performance on a variety of tasks, many properties of deep neural networks are not yet theoretically understood. One such mystery is the depth degeneracy phenomenon: the deeper you make your network, the closer your…
Linear models are widely used in high-stakes decision-making due to their simplicity and interpretability. Yet when fairness constraints such as demographic parity are introduced, their effects on model coefficients, and thus on how…
Positive-Unlabeled (PU) learning presents unique challenges due to the lack of explicitly labeled negative samples, particularly in high-stakes domains such as fraud detection and medical diagnosis. To address data scarcity and privacy…
We study the estimation problem of distribution-on-distribution regression, where both predictors and responses are probability measures. Existing approaches typically rely on a global optimal transport map or tangent-space linearization,…
We present a theoretical framework for deriving the general $n$-th order Fr\'echet derivatives of singular values in real rectangular matrices, by leveraging reduced resolvent operators from Kato's analytic perturbation theory for…
We introduce a new family of algorithms for detecting and estimating a rank-one signal from a noisy observation under prior information about that signal's direction, focusing on examples where the signal is known to have entries biased to…
Understanding the advantages of deep neural networks trained by gradient descent (GD) compared to shallow models remains an open theoretical challenge. In this paper, we introduce a class of target functions (single and multi-index Gaussian…