English
Related papers

Related papers: Sampling Space-Saving Set Sketches

200 papers

Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building…

Data Structures and Algorithms · Computer Science 2023-02-07 Jonathan Hehir , Daniel Ting , Graham Cormode

Stream monitoring is fundamental in many data stream applications, such as financial data trackers, security, anomaly detection, and load balancing. In that respect, quantiles are of particular interest, as they often capture the user's…

Data Structures and Algorithms · Computer Science 2022-01-07 Rana Shahout , Roy Friedman , Ran Ben Basat

In rapid and massive data streams, it is often not possible to estimate the frequency of items with complete accuracy. To perform the operation in a reasonable amount of space and with sufficiently low latency, approximated methods are…

Databases · Computer Science 2019-04-18 Arijit Khan , Sixing Yan

We introduce and study a new data sketch for processing massive datasets. It addresses two common problems: 1) computing a sum given arbitrary filter conditions and 2) identifying the frequent items or heavy hitters in a data set. For the…

Computation · Statistics 2017-09-14 Daniel Ting

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data.…

Machine Learning · Statistics 2018-02-08 Panagiotis A. Traganitis , Georgios B. Giannakis

Estimating the number of distinct elements in a data stream is well understood when repeated elements are identical. In modern settings, however, observations are high-dimensional and noisy, so repeated instances of the same object are only…

Machine Learning · Statistics 2026-05-18 Nikos Tsikouras , Constantine Caramanis , Christos Tzamos

The challenge of estimating similarity between sets has been a significant concern in data science, finding diverse applications across various domains. However, previous approaches, such as MinHash, have predominantly centered around…

Data Structures and Algorithms · Computer Science 2024-05-31 Fenghao Dong , Yang He , Yutong Liang , Zirui Liu , Yuhan Wu , Peiqing Chen , Tong Yang

We introduce a new sub-linear space sketch---the Weight-Median Sketch---for learning compressed linear classifiers over data streams while supporting the efficient recovery of large-magnitude weights in the model. This enables…

Machine Learning · Computer Science 2018-04-10 Kai Sheng Tai , Vatsal Sharan , Peter Bailis , Gregory Valiant

The sliding window model of computation captures scenarios in which data are continually arriving in the form of a stream, and only the most recent $w$ items are used for analysis. In this setting, an algorithm needs to accurately track…

Cryptography and Security · Computer Science 2024-06-13 Yiping Wang , Yanhao Wang , Cen Chen

While traditional data-management systems focus on evaluating single, ad-hoc queries over static data sets in a centralized setting, several emerging applications require (possibly, continuous) answers to queries on dynamic data that is…

Databases · Computer Science 2015-03-20 Odysseas Papapetrou , Minos Garofalakis , Antonios Deligiannakis

Modern data stream applications demand memory-efficient solutions for accurately tracking frequent items, such as heavy hitters and heavy changers, under strict resource constraints. Traditional sketches face inherent accuracy-memory…

Databases · Computer Science 2025-05-20 Zicang Xu , Yuxuan Tian , Yuhan Wu , Tong Yang

Sketch-based streaming algorithms allow efficient processing of big data. These algorithms use small fixed-size storage to store a summary ("sketch") of the input data, and use probabilistic algorithms to estimate the desired quantity.…

Databases · Computer Science 2016-11-08 Reuven Cohen , Liran Katzir , Aviv Yehezkel

Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but…

Data Structures and Algorithms · Computer Science 2019-12-06 Arik Rinberg , Alexander Spiegelman , Edward Bortnikov , Eshcar Hillel , Idit Keidar , Lee Rhodes , Hadar Serviansky

Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element…

Databases · Computer Science 2024-06-28 Yiyan Qi , Rundong Li , Pinghui Wang , Yufang Sun , Rui Xing

Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality…

Data Structures and Algorithms · Computer Science 2016-07-19 Branislav Kveton , Hung Bui , Mohammad Ghavamzadeh , Georgios Theocharous , S. Muthukrishnan , Siqi Sun

The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the…

Machine Learning · Computer Science 2020-07-27 Benjamin W. Priest , Alec Dunton , Geoffrey Sanders

Graph streams represent data interactions in real applications. The mining of graph streams plays an important role in network security, social network analysis, and traffic control, among others. However, the sheer volume and high dynamics…

Databases · Computer Science 2023-04-07 Yiling Zeng , Chunyao Song , Yuhan Li , Tingjian Ge

Document sketching using Jaccard similarity has been a workable effective technique in reducing near-duplicates in Web page and image search results, and has also proven useful in file system synchronization, compression and learning…

Data Structures and Algorithms · Computer Science 2014-10-17 Bernhard Haeupler , Mark Manasse , Kunal Talwar

In this paper, we consider the problem of estimating the distance between any two large data streams in small- space constraint. This problem is of utmost importance in data intensive monitoring applications where input streams are…

Data Structures and Algorithms · Computer Science 2012-08-01 Emmanuelle Anceaume , Yann Busnel

In data stream applications, one of the critical issues is to estimate the frequency of each item in the specific multiset. The multiset means that each item in this set can appear multiple times. The data streams in many applications are…

Data Structures and Algorithms · Computer Science 2020-01-07 Ning Li
‹ Prev 1 2 3 10 Next ›