机器学习
In this paper, we propose a communication-efficient penalized regression algorithm for high-dimensional sparse linear regression models with massive data. This approach incorporates an optimized distributed system communication algorithm,…
Stochastic coordinate descent algorithms are efficient methods in which each iterate is obtained by fixing most coordinates at their values from the current iteration, and approximately minimizing the objective with respect to the remaining…
Unbiased data synthesis is crucial for evaluating causal discovery algorithms in the presence of unobserved confounding, given the scarcity of real-world datasets. A common approach, implicit parameterization, encodes unobserved confounding…
We consider a teacher-student model of supervised learning with a fully-trained two-layer neural network whose width $k$ and input dimension $d$ are large and proportional. We provide an effective theory for approximating the Bayes-optimal…
Algorithmic recommendations and decisions have become ubiquitous in today's society. Many of these data-driven policies, especially in the realm of public policy, are based on known, deterministic rules to ensure their transparency and…
Best subset selection in linear regression is well known to be nonconvex and computationally challenging to solve, as the number of possible subsets grows rapidly with increasing dimensionality of the problem. As a result, finding the…
Electricity demand forecasting is key to ensuring that supply meets demand lest the grid would blackout. Reliable short-term forecasts may be obtained by combining a Generalized Additive Models (GAM) with a State-Space model (Obst et al.,…
Machine Learning algorithms are ubiquitous in key decision-making contexts such as organizational justice or healthcare, which has spawned a great demand for fairness in these procedures. In this paper we focus on the application of fair ML…
We investigate the problem of learning a Single Index Model (SIM)- a popular model for studying the ability of neural networks to learn features - from anisotropic Gaussian inputs by training a neuron using vanilla Stochastic Gradient…
Fr\'echet regression extends linear regression to model complex responses in metric spaces, making it particularly relevant for multi-label regression, where eachinstance can have multiple associated labels. However, addressing noise and…
We introduce a new method to jointly reduce the dimension of the input and output space of a function between high-dimensional spaces. Choosing a reduced input subspace influences which output subspace is relevant and vice versa.…
Algorithmic stability is a central notion in learning theory that quantifies the sensitivity of an algorithm to small changes in the training data. If a learning algorithm satisfies certain stability properties, this leads to many important…
Many modern statistical analysis and machine learning applications require training models on sensitive user data. Under a formal definition of privacy protection, differentially private algorithms inject calibrated noise into the…
Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is…
Adapting machine learning algorithms to better handle the presence of clusters or batch effects within training datasets is important across a wide variety of biological applications. This article considers the effect of ensembling Random…
We propose a novel two-stage framework of generative models named Debiasing Kernel-Based Generative Models (DKGM) with the insights from kernel density estimation (KDE) and stochastic approximation. In the first stage of DKGM, we employ KDE…
Many training methods, such as Adam(W) and Shampoo, learn a positive-definite curvature matrix and apply an inverse root before preconditioning. Recently, non-diagonal training methods, such as Shampoo, have gained significant attention;…
We present a general variational framework for the training of freeform nonlinearities in layered computational architectures subject to some slope constraints. The regularization that we add to the traditional training loss penalizes the…
This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures $\mathcal{P}_{\mathrm{a.c.}}(\Omega)$ with $\Omega$ a compact and convex subset of…
Kernel two-sample testing provides a powerful framework for distinguishing any pair of distributions based on $n$ sample points. However, existing kernel tests either run in $n^2$ time or sacrifice undue power to improve runtime. To address…