Honghao Lin
We study adversarially robust algorithms for insertion-deletion (turnstile) streams, where future updates may depend on past algorithm outputs. While robust algorithms exist for insertion-only streams with only a polylogarithmic overhead in…
We study $\ell_p$ sampling and frequency moment estimation in a single-pass insertion-only data stream. For $p \in (0,2)$, we present a nearly space-optimal approximate $\ell_p$ sampler that uses $\widetilde{O}(\log n \log(1/\delta))$ bits…
In the distributed monitoring model, a data stream over a universe of size $n$ is distributed over $k$ servers, who must continuously provide certain statistics of the overall dataset, while minimizing communication with a central…
We present space-efficient linear sketches for estimating trimmed statistics of an $n$-dimensional frequency vector $x$, e.g., the sum of $p$-th powers of the largest $k$ frequencies (i.e., entries) in absolute value, or the $k$-trimmed…
We introduce a novel technique for ``lifting'' dimension lower bounds for linear sketches in the real-valued setting to dimension lower bounds for linear sketches with polynomially-bounded integer entries when the input is a…
We consider the problem of finding a minimum cut of a weighted graph presented as a single-pass stream. While graph sparsification in streams has been intensively studied, the specific application of finding minimum cuts in streams is less…
Graph Transformers excel in long-range dependency modeling, but generally require quadratic memory complexity in the number of nodes in an input graph, and hence have trouble scaling to large graphs. Sparse attention variants such as…
Transductive tasks on graphs differ fundamentally from typical supervised machine learning tasks, as the independent and identically distributed (i.i.d.) assumption does not hold among samples. Instead, all train/test/validation samples are…
The majority of streaming problems are defined and analyzed in a static setting, where the data stream is any worst-case sequence of insertions and deletions that is fixed in advance. However, many real-world applications require a more…
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. Such estimates can be used, for example, to quickly assess how useful a more expensive low-rank approximation computation will be. The…
In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors. The first problem is to approximate cuts in balanced directed…
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem, e.g., low-rank approximation and regression. In the learning-based…
In the $\ell_p$-subspace sketch problem, we are given an $n\times d$ matrix $A$ with $n>d$, and asked to build a small memory data structure $Q(A,\epsilon)$ so that, for any query vector $x\in\mathbb{R}^d$, we can output a number in…
We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M,…
We initiate a broad study of classical problems in the streaming model with insertions and deletions in the setting where we allow the approximation factor $\alpha$ to be much larger than $1$. Such algorithms can use significantly less…
We create classical (non-quantum) dynamic data structures supporting queries for recommender systems and least-squares regression that are comparable to their quantum analogues. De-quantizing such algorithms has received a flurry of…
A treap is a classic randomized binary search tree data structure that is easy to implement and supports O(\log n) expected time access. However, classic treaps do not take advantage of the input distribution or patterns in the input. Given…
We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and…
Sketching is a dimensionality reduction technique where one compresses a matrix by linear combinations that are chosen at random. A line of work has shown how to sketch the Hessian to speed up each iteration in a second order method, but…
We study the problem of learning Bayesian networks where an $\epsilon$-fraction of the samples are adversarially corrupted. We focus on the fully-observable case where the underlying graph structure is known. In this work, we present the…