机器学习
Model parsimony is an important \emph{cognitive bias} in data-driven modelling that aids interpretability and helps to prevent over-fitting. Sparse identification of nonlinear dynamics (SINDy) methods are able to learn sparse…
Deep linear networks have been extensively studied, as they provide simplified models of deep learning. However, little is known in the case of finite-width architectures with multiple outputs and convolutional layers. In this manuscript,…
A common theoretical approach to understanding neural networks is to take an infinite-width limit, at which point the outputs become Gaussian process (GP) distributed. This is known as a neural network Gaussian process (NNGP). However, the…
A Random Vector Functional Link (RVFL) network is a depth-2 neural network with random inner weights and biases. Only the outer weights of such an architecture are to be learned, so the learning process boils down to a linear optimization…
Undirected graphical models known as Markov networks are popular for a wide variety of applications ranging from statistical physics to computational biology. Traditionally, learning of the network structure has been done under the…
Understanding temporal patterns in online search behavior is crucial for real-time marketing and trend forecasting. Google Trends offers a rich proxy for public interest, yet the high dimensionality and noise of its time-series data present…
This paper studies the $\ell^p$-Lipschitz constants of ReLU neural networks $\Phi: \mathbb{R}^d \to \mathbb{R}$ with random parameters for $p \in [1,\infty]$. The distribution of the weights follows a variant of the He initialization and…
We study classical asymmetric binary perceptron (ABP) and associated \emph{local entropy} (LE) as potential source of its algorithmic hardness. Isolation of \emph{typical} ABP solutions in SAT phase seemingly suggests a universal…
While diffusion models generate high-quality images via probability flow, the theoretical understanding of this process remains incomplete. A key question is when probability flow converges to training samples or more general points on the…
Solving conditional moment restrictions (CMRs) is a key problem considered in statistics, causal inference, and econometrics, where the aim is to solve for a function of interest that satisfies some conditional moment equalities.…
This paper provides an elementary, self-contained analysis of diffusion-based sampling methods for generative modeling. In contrast to existing approaches that rely on continuous-time processes and then discretize, our treatment works…
Current causal discovery approaches require restrictive model assumptions in the absence of interventional data to ensure structure identifiability. These assumptions often do not hold in real-world applications leading to a loss of…
Evaluating the diversity of generative models without reference data poses methodological challenges. The reference-free Vendi and RKE scores address this by quantifying the diversity of generated data using matrix-based entropy measures.…
We present a fully constructive analysis of deep ReLU neural networks for classification and function approximation tasks. First, we prove that any dataset with $N$ distinct points in $\mathbb{R}^d$ and $M$ output classes can be exactly…
In this work, we present and study Continuous Generative Neural Networks (CGNNs), namely, generative models in the continuous setting: the output of a CGNN belongs to an infinite-dimensional function space. The architecture is inspired by…
High-dimensional data are ubiquitous, with examples ranging from natural images to scientific datasets, and often reside near low-dimensional manifolds. Leveraging this geometric structure is vital for downstream tasks, including signal…
Stochastic Gradient Descent (SGD) is fundamental for training deep neural networks, especially in non-convex settings. Understanding SGD's generalization properties is crucial for ensuring robust model performance on unseen data. In this…
Neural estimators are simulation-based estimators for the parameters of a family of statistical models, which build a direct mapping from the sample to the parameter vector. They benefit from the versatility of available network…
Companion paper [118] developed a powerful \emph{Random duality theory} (RDT) based analytical program to statistically characterize performance of \emph{descending} phase retrieval algorithms (dPR) (these include all variants of gradient…
We analyze the relation between spectral initializers and theoretical limits of \emph{descending} phase retrieval algorithms (dPR). In companion paper [104], for any sample complexity ratio, $\alpha$, \emph{parametric manifold}, ${\mathcal…