Related papers: Annotations for Sparse Data Streams
Considerable effort has been devoted to the development of streaming algorithms for analyzing massive graphs. Unfortunately, many results have been negative, establishing that a wide variety of problems require $\Omega(n^2)$ space to solve.…
We study the general problem of computing frequency-based functions, i.e., the sum of any given function of data stream frequencies. Special cases include fundamental data stream problems such as computing the number of distinct elements…
Motivated by the trend to outsource work to commercial cloud computing services, we consider a variation of the streaming paradigm where a streaming algorithm can be assisted by a powerful helper that can provide annotations to the data…
We study graph computations in an enhanced data streaming setting, where a space-bounded client reading the edge stream of a massive graph may delegate some of its work to a cloud service. We seek algorithms that allow the client to verify…
Monitoring the performance of large shared computing systems such as the cloud computing infrastructure raises many challenging algorithmic problems. One common problem is to track users with the largest deviation from the norm (outliers),…
Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming setting, but the progress on graph problems…
We consider the problem of finding a minimum cut of a weighted graph presented as a single-pass stream. While graph sparsification in streams has been intensively studied, the specific application of finding minimum cuts in streams is less…
When computation is outsourced, the data owner would like to be assured that the desired computation has been performed correctly by the service provider. In theory, proof systems can give the necessary assurance, but prior work is not…
Data stream algorithms tackle operations on high-volume sequences of read-once data items. Data stream scenarios include inherently real-time systems like sensor networks and financial markets. They also arise in purely-computational…
We introduce the {\em certification} of solutions to graph problems when access to the input is restricted. This topic has received a lot of attention in the distributed computing setting, and we introduce it here in the context of…
We introduce a new computational model for data streams: asymptotically exact streaming algorithms. These algorithms have an approximation ratio that tends to one as the length of the stream goes to infinity while the memory used by the…
The need to estimate a particular quantile of a distribution is an important problem which frequently arises in many computer vision and signal processing applications. For example, our work was motivated by the requirements of many…
Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary -- for many streaming problems, both relaxations must be employed…
As Large Language Models (LLMs) scale to million-token contexts, traditional Mechanistic Interpretability techniques for analyzing attention scale quadratically with context length, demanding terabytes of memory beyond 100,000 tokens. We…
An increasing number of use cases require a timely extraction of non-trivial knowledge from semantically annotated data streams, especially on the Web and for the Internet of Things (IoT). Often, this extraction requires expressive…
When delegating computation to a service provider, as in cloud computing, we seek some reassurance that the output is correct and complete. Yet recomputing the output as a check is inefficient and expensive, and it may not even be feasible…
The seminal work of Ahn, Guha, and McGregor in 2012 introduced the graph sketching technique and used it to present the first streaming algorithms for various graph problems over dynamic streams with both insertions and deletions of edges.…
Large data applications rely on storing data in massive, sparse graphs with millions to trillions of nodes. Graph-based methods, such as node prediction, aim for computational efficiency regardless of graph size. Techniques like localized…
As graphs continue to grow in size, we seek ways to effectively process such data at scale. The model of streaming graph processing, in which a compact summary is maintained as each edge insertion/deletion is observed, is an attractive one.…
Anomaly detection is critical for finding suspicious behavior in innumerable systems. We need to detect anomalies in real-time, i.e. determine if an incoming entity is anomalous or not, as soon as we receive it, to minimize the effects of…