机器学习
Recommender systems are essential tools in the digital landscape for connecting users with content that more closely aligns with their preferences. Matrix completion is a widely used statistical framework for such systems, aiming to predict…
Testing under what conditions the product satisfies the desired properties is a fundamental problem in manufacturing industry. If the condition and the property are respectively regarded as the input and the output of a black-box function,…
The dynamic mode decomposition (DMD) is a data-driven approach that extracts the dominant features from spatiotemporal data. In this work, we introduce sparse-mode DMD, a new variant of the optimized DMD framework that specifically…
Solder joint reliability related to failures due to thermomechanical loading is a critically important yet physically complex engineering problem. As a result, simulated behavior is oftentimes computationally expensive. In an increasingly…
Symbolic regression automates the process of learning closed-form mathematical models from data. Standard approaches to symbolic regression, as well as newer deep learning approaches, rely on heuristic model selection criteria, heuristic…
For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that…
Changepoint detection is a technique used to identify significant shifts in sequences and is widely used in fields such as finance, genomics, and medicine. To identify the changepoints, dynamic programming (DP) algorithms, particularly…
Learning a nonparametric system of ordinary differential equations from trajectories in a $d$-dimensional state space requires learning $d$ functions of $d$ variables. Explicit formulations often scale quadratically in $d$ unless additional…
Tensor completion exhibits an interesting computational-statistical gap in terms of the number of samples needed to perform tensor estimation. While there are only $\Theta(tn)$ degrees of freedom in a $t$-order tensor with $n^t$ entries,…
The paper studies the DIverse MultiPLEx Signed Generalized Random Dot Product Graph (DIMPLE-SGRDPG) network model (Pensky (2024)), where all layers of the network have the same collection of nodes. In addition, all layers can be partitioned…
The discovery of causal relationships is a foundational problem in artificial intelligence, statistics, epidemiology, economics, and beyond. While elegant theories exist for accurate causal discovery given infinite data, real-world…
Given i.i.d.\ samples $X_n =\{ x_1, \dots, x_n \}$ from a distribution supported on a low dimensional manifold ${M}$ embedded in Eucliden space, we consider the graph Laplacian operator $\Delta_n$ associated to an $\varepsilon$-proximity…
Causal discovery algorithms typically recover causal graphs only up to their Markov equivalence classes unless additional parametric assumptions are made. The sizes of these equivalence classes reflect the limits of what can be learned…
Testing conditional independence between two random vectors given a third is a fundamental and challenging problem in statistics, particularly in multivariate nonparametric settings due to the complexity of conditional structures. We…
Learning from non-stationary data streams subject to concept drift requires models that can adapt on-the-fly while remaining resource-efficient. Existing adaptive ensemble methods often rely on coarse-grained adaptation mechanisms or simple…
Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine…
A/B testing is widely used in modern technology companies for policy evaluation and product deployment, with the goal of comparing the outcomes under a newly-developed policy against a standard control. Various causal inference and…
Despite the significance of probabilistic time-series forecasting models, their evaluation metrics often involve intractable integrations. The most widely used metric, the continuous ranked probability score (CRPS), is a strictly proper…
Given $p$ samples, each of which may or may not be defective, group testing (GT) aims to determine their defect status by performing tests on $n < p$ `groups', where a group is formed by mixing a subset of the $p$ samples. Assuming that the…
Differentially private stochastic gradient descent privatizes model training by injecting noise into each iteration, where the noise magnitude increases with the number of model parameters. Recent works suggest that we can reduce the noise…