机器学习
The stochastic gradient descent (SGD) algorithm has been widely used to optimize deep Cox neural network (Cox-NN) by updating model parameters using mini-batches of data. We show that SGD aims to optimize the average of mini-batch…
The Koopman operator provides a powerful framework for representing the dynamics of general nonlinear dynamical systems. However, existing data-driven approaches to learning the Koopman operator rely on batch data. In this work, we present…
The support vector machine (SVM) is a supervised learning algorithm that finds a maximum-margin linear classifier, often after mapping the data to a high-dimensional feature space via the kernel trick. Recent work has demonstrated that in…
Predicting counterfactual outcomes in longitudinal data, where sequential treatment decisions heavily depend on evolving patient states, is critical yet notoriously challenging due to complex time-dependent confounding and inadequate…
In-context learning enables transformers to adapt to new tasks from a few examples at inference time, while grokking highlights that this generalization can emerge abruptly only after prolonged training. We study task generalization and…
We decompose the Kullback--Leibler generalization error (GE) -- the expected KL divergence from the data distribution to the trained model -- of unsupervised learning into three non-negative components: model error, data bias, and variance.…
Monitoring binomial proportions across multiple independent streams is a critical challenge in Statistical Process Control (SPC), with applications from manufacturing to cybersecurity. While EWMA charts offer sensitivity to small shifts,…
This work is concerned with the continuum limit of a graph-based data visualization technique called the t-Distributed Stochastic Neighbor Embedding (t-SNE), which is widely used for visualizing data in a variety of applications, but is…
In optimization problems, some variable subsets may have a joint non-linear or non-monotonical influence on the function value. Therefore, knowledge of variable dependencies may be crucial for effective optimization, and many…
The sequential nature of autoregressive next-token prediction imposes a fundamental speed limit on large language models. While continuous flow models offer a path to parallel generation, they traditionally demand expensive iterative…
For most process systems, knowledge of the model structure is incomplete. This missing physics must then be learned from experimental data. Recently, a combination of universal differential equations and symbolic regression has become a…
The No-U-Turn Sampler (NUTS) is the computational workhorse of modern Bayesian software libraries, yet its qualitative and quantitative convergence guarantees were established only recently. A significant gap remains in the theoretical…
We consider tensor factorizations based on sparse measurements of the components of relatively high rank tensors. The measurements are designed in a way that the underlying graph of interactions is a random graph. The setup will be useful…
Self-attention layers have become fundamental building blocks of modern deep neural networks, yet their theoretical understanding remains limited, particularly from the perspective of random matrix theory. In this work, we provide a…
The majority of parameters in neural networks are naturally represented as matrices. However, most commonly used optimizers treat these matrix parameters as flattened vectors during optimization, potentially overlooking their inherent…
Modern applications increasingly involve highly sensitive network data, where raw edges cannot be shared due to privacy constraints. We propose \texttt{TransNet}, a new spectral clustering-based transfer learning framework that improves…
Inferring the mechanical properties of soft tissues from measured deformations is a fundamental challenge in elastography. A rarely examined assumption underlying existing approaches is that the assumed constitutive law correctly describes…
The training of neural networks by gradient descent methods is a cornerstone of the deep learning revolution. Yet, despite some recent progress, a complete theory explaining its success is still missing. This article presents, for…
As generative models enable rapid creation of high-fidelity images, societal concerns about misinformation and authenticity have intensified. A promising remedy is multi-bit image watermarking, which embeds a multi-bit message into an image…
Feature importance methods using unrestricted permutations are flawed due to extrapolation errors; such errors appear in all non-trivial variable importance approaches. We propose three new approaches: conditional model reliance and…