Related papers: Graphical Model Sketch

QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams

Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element…

Databases · Computer Science 2024-06-28 Yiyan Qi , Rundong Li , Pinghui Wang , Yufang Sun , Rui Xing

A statistical analysis of probabilistic counting algorithms

This paper considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo-random variates to form…

Computation · Statistics 2012-11-20 Peter Clifford , Ioana A. Cosma

Sketching for Latent Dirichlet-Categorical Models

Recent work has explored transforming data sets into smaller, approximate summaries in order to scale Bayesian inference. We examine a related problem in which the parameters of a Bayesian model are very large and expensive to store in…

Machine Learning · Computer Science 2018-10-03 Joseph Tassarotti , Jean-Baptiste Tristan , Michael Wick

gSketch: On Query Estimation in Graph Streams

Many dynamic applications are built upon large network infrastructures, such as social networks, communication networks, biological networks and the Web. Such applications create data that can be naturally modeled as graph streams, in which…

Databases · Computer Science 2011-12-01 Peixiang Zhao , Charu C. Aggarwal , Min Wang

MTS Sketch for Accurate Estimation of Set-Expression Cardinalities from Small Samples

Sketch-based streaming algorithms allow efficient processing of big data. These algorithms use small fixed-size storage to store a summary ("sketch") of the input data, and use probabilistic algorithms to estimate the desired quantity.…

Databases · Computer Science 2016-11-08 Reuven Cohen , Liran Katzir , Aviv Yehezkel

Sampling Space-Saving Set Sketches

Large, distributed data streams are now ubiquitous. High-accuracy sketches with low memory overhead have become the de facto method for analyzing this data. For instance, if we wish to group data by some label and report the largest counts…

Data Structures and Algorithms · Computer Science 2024-02-14 Homin K. Lee , Charles Masson

Scaling Graph Clustering with Distributed Sketches

The unsupervised learning of community structure, in particular the partitioning vertices into clusters or communities, is a canonical and well-studied problem in exploratory graph analysis. However, like most graph analyses the…

Machine Learning · Computer Science 2020-07-27 Benjamin W. Priest , Alec Dunton , Geoffrey Sanders

Sketched Subspace Clustering

The immense amount of daily generated and communicated data presents unique challenges in their processing. Clustering, the grouping of data without the presence of ground-truth labels, is an important tool for drawing inferences from data.…

Machine Learning · Statistics 2018-02-08 Panagiotis A. Traganitis , Georgios B. Giannakis

A new Frequency Estimation Sketch for Data Streams

In data stream applications, one of the critical issues is to estimate the frequency of each item in the specific multiset. The multiset means that each item in this set can appear multiple times. The data streams in many applications are…

Data Structures and Algorithms · Computer Science 2020-01-07 Ning Li

Sketched Sum-Product Networks for Joins

Sketches have shown high accuracy in multi-way join cardinality estimation, a critical problem in cost-based query optimization. Accurately estimating the cardinality of a join operation -- analogous to its computational cost -- allows the…

Databases · Computer Science 2025-06-18 Brian Tsan , Abylay Amanbayev , Asoke Datta , Florin Rusu

Sketch-Flip-Merge: Mergeable Sketches for Private Distinct Counting

Data sketching is a critical tool for distinct counting, enabling multisets to be represented by compact summaries that admit fast cardinality estimates. Because sketches may be merged to summarize multiset unions, they are a basic building…

Data Structures and Algorithms · Computer Science 2023-02-07 Jonathan Hehir , Daniel Ting , Graham Cormode

Count-Min: Optimal Estimation and Tight Error Bounds using Empirical Error Distributions

The Count-Min sketch is an important and well-studied data summarization method. It allows one to estimate the count of any item in a stream using a small, fixed size data sketch. However, the accuracy of the sketch depends on…

Data Structures and Algorithms · Computer Science 2018-11-13 Daniel Ting

Exploiting the Structure via Sketched Gradient Algorithms

Sketched gradient algorithms have been recently introduced for efficiently solving the large-scale constrained Least-squares regressions. In this paper we provide novel convergence analysis for the basic method {\it Gradient Projection…

Optimization and Control · Mathematics 2017-06-05 Junqi Tang , Mohammad Golbabaee , Mike Davies

Count-Min-Log sketch: Approximately counting with approximate counters

Count-Min Sketch is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested by the…

Information Retrieval · Computer Science 2015-02-18 Guillaume Pitel , Geoffroy Fouquier

Estimating Cardinalities with Deep Sketches

We introduce Deep Sketches, which are compact models of databases that allow us to estimate the result sizes of SQL queries. Deep Sketches are powered by a new deep learning approach to cardinality estimation that can capture correlations…

Databases · Computer Science 2019-04-18 Andreas Kipf , Dimitri Vorona , Jonas Müller , Thomas Kipf , Bernhard Radke , Viktor Leis , Peter Boncz , Thomas Neumann , Alfons Kemper

Sketching for Large-Scale Learning of Mixture Models

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. We propose a "compressive learning" framework where we estimate model parameters from a sketch of the training data. This sketch…

Machine Learning · Computer Science 2017-05-08 Nicolas Keriven , Anthony Bourrier , Rémi Gribonval , Patrick Pérez

Convolution and Cross-Correlation of Count Sketches Enables Fast Cardinality Estimation of Multi-Join Queries

With the increasing rate of data generated by critical systems, estimating functions on streaming data has become essential. This demand has driven numerous advancements in algorithms designed to efficiently query and analyze one or more…

Databases · Computer Science 2024-05-16 Mike Heddes , Igor Nunes , Tony Givargis , Alex Nicolau

Fast Concurrent Data Sketches

Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but…

Data Structures and Algorithms · Computer Science 2019-12-06 Arik Rinberg , Alexander Spiegelman , Edward Bortnikov , Eshcar Hillel , Idit Keidar , Lee Rhodes , Hadar Serviansky

Data Sketching and Stacking: A Confluence of Two Strategies for Predictive Inference in Gaussian Process Regressions with High-Dimensional Features

This article focuses on drawing computationally-efficient predictive inference from Gaussian process (GP) regressions with a large number of features when the response is conditionally independent of the features given the projection to a…

Methodology · Statistics 2024-09-27 Samuel Gailliot , Rajarshi Guhaniyogi , Roger D. Peng

Breaking the Quadratic Barrier: Robust Cardinality Sketches for Adaptive Queries

Cardinality sketches are compact data structures that efficiently estimate the number of distinct elements across multiple queries while minimizing storage, communication, and computational costs. However, recent research has shown that…

Data Structures and Algorithms · Computer Science 2025-02-11 Edith Cohen , Mihir Singhal , Uri Stemmer