Related papers: Sampling Space-Saving Set Sketches

Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting

Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building…

Data Structures and Algorithms · Computer Science 2023-02-07 Jonathan Hehir , Daniel Ting , Graham Cormode

SQUAD: Combining Sketching and Sampling Is Better than Either for Per-item Quantile Estimation

Stream monitoring is fundamental in many data stream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often capture the user's…

Data Structures and Algorithms · Computer Science 2022-01-07 Rana Shahout , Roy Friedman , Ran Ben Basat

Composite Hashing for Data Stream Sketches

In rapid and massive data streams, it is often not possible to estimate the frequency of items with complete accuracy. To perform the operation in a reasonable amount of space and with sufficiently low latency, approximated methods are…

Databases · Computer Science 2019-04-18 Arijit Khan , Sixing Yan

Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation

We introduce and study a new data sketch for processing massive datasets. It addresses two common problems: 1) computing a sum given arbitrary filter conditions and 2) identifying the frequent items or heavy hitters in a data set. For the…

Computation · Statistics 2017-09-14 Daniel Ting

Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data.…

Machine Learning · Statistics 2018-02-08 Panagiotis A. Traganitis , Georgios B. Giannakis

MaxSketch: Robust Distinct Counting in Streams via Random Projections

Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only…

Machine Learning · Statistics 2026-05-18 Nikos Tsikouras , Constantine Caramanis , Christos Tzamos

SimiSketch: Efficiently Estimating Similarity of streaming Multisets

The challenge of estimating similarity between sets has been a significant concern in data science, finding diverse applications across various domains. However, previous approaches, such as MinHash, have predominantly centered around…

Data Structures and Algorithms · Computer Science 2024-05-31 Fenghao Dong , Yang He , Yutong Liang , Zirui Liu , Yuhan Wu , Peiqing Chen , Tong Yang

Sketching Linear Classifiers over Data Streams

We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables…

Machine Learning · Computer Science 2018-04-10 Kai Sheng Tai , Vatsal Sharan , Peter Bailis , Gregory Valiant

DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding Windows (Technical Report)

The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent $w$ items are used for analysis. In this setting, an algorithm needs to accurately track…

Cryptography and Security · Computer Science 2024-06-13 Yiping Wang , Yanhao Wang , Cen Chen

Sketch-based Querying of Distributed Sliding-Window Data Streams

While traditional data-management systems focus on evaluating single, ad-hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is…

Databases · Computer Science 2015-03-20 Odysseas Papapetrou , Minos Garofalakis , Antonios Deligiannakis

Hidden Sketch: A Space-Efficient Reversible Sketch for Tracking Frequent Items in Data Streams

Modern data stream applications demand memory-efficient solutions for accurately tracking frequent items, such as heavy hitters and heavy changers, under strict resource constraints. Traditional sketches face inherent accuracy-memory…

Databases · Computer Science 2025-05-20 Zicang Xu , Yuxuan Tian , Yuhan Wu , Tong Yang

MTS Sketch for Accurate Estimation of Set-Expression Cardinalities from Small Samples

Sketch-based streaming algorithms allow efficient processing of big data. These algorithms use small fixed-size storage to store a summary ("sketch") of the input data, and use probabilistic algorithms to estimate the desired quantity.…

Databases · Computer Science 2016-11-08 Reuven Cohen , Liran Katzir , Aviv Yehezkel

Fast Concurrent Data Sketches

Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but…

Data Structures and Algorithms · Computer Science 2019-12-06 Arik Rinberg , Alexander Spiegelman , Edward Bortnikov , Eshcar Hillel , Idit Keidar , Lee Rhodes , Hadar Serviansky

QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams

Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element…

Databases · Computer Science 2024-06-28 Yiyan Qi , Rundong Li , Pinghui Wang , Yufang Sun , Rui Xing

Graphical Model Sketch

Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality…

Data Structures and Algorithms · Computer Science 2016-07-19 Branislav Kveton , Hung Bui , Mohammad Ghavamzadeh , Georgios Theocharous , S. Muthukrishnan , Siqi Sun

Scaling Graph Clustering with Distributed Sketches

The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the…

Machine Learning · Computer Science 2020-07-27 Benjamin W. Priest , Alec Dunton , Geoffrey Sanders

LSketch: A Label-Enabled Graph Stream Sketch Toward Time-Sensitive Queries

Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics…

Databases · Computer Science 2023-04-07 Yiling Zeng , Chunyao Song , Yuhan Li , Tingjian Ge

Consistent Weighted Sampling Made Fast, Small, and Easy

Document sketching using Jaccard similarity has been a workable effective technique in reducing near-duplicates in Web page and image search results, and has also proven useful in file system synchronization, compression and learning…

Data Structures and Algorithms · Computer Science 2014-10-17 Bernhard Haeupler , Mark Manasse , Kunal Talwar

Sketch \star-metric: Comparing Data Streams via Sketching

In this paper, we consider the problem of estimating the distance between any two large data streams in small- space constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are…

Data Structures and Algorithms · Computer Science 2012-08-01 Emmanuelle Anceaume , Yann Busnel

A new Frequency Estimation Sketch for Data Streams

In data stream applications, one of the critical issues is to estimate the frequency of each item in the specific multiset. The multiset means that each item in this set can appear multiple times. The data streams in many applications are…

Data Structures and Algorithms · Computer Science 2020-01-07 Ning Li