English
Related papers

Related papers: SALSA: Self-Adjusting Lean Streaming Analytics

200 papers

Modern stream processing systems often need to track the frequency of distinct keys in a data stream in real-time. Since maintaining exact counts can require a prohibitive amount of memory, many applications rely on compact, probabilistic…

Data Structures and Algorithms · Computer Science 2026-04-29 Navid Eslami , Ioana O. Bercea , Rasmus Pagh , Niv Dayan

Due to the large data volume and number of distinct elements, space is often the bottleneck of many stream processing systems. The data structures used by these systems often consist of counters whose optimization yields significant memory…

Networking and Internet Architecture · Computer Science 2025-02-21 Ran Ben Basat , Gil Einziger , Bilal Tyah , Shay Vargaftik

Matrix sketching is a recently developed data compression technique. An input matrix A is efficiently approximated with a smaller matrix B, so that B preserves most of the properties of A up to some guaranteed approximation ratio. In so…

Machine Learning · Statistics 2019-12-03 Roberta Falcone , Angela Montanari , Laura Anderlucci

With the exponentially growing Internet traffic, sketch data structure with a probabilistic algorithm has been expected to be an alternative solution for non-compromised (non-selective) security monitoring. While facilitating counting…

Cryptography and Security · Computer Science 2025-03-18 Seungsam Yang , Seyed Mohammad Mehdi Mirnajafizadeh , Sian Kim , Rhongho Jang , DaeHun Nyang

The challenge of estimating similarity between sets has been a significant concern in data science, finding diverse applications across various domains. However, previous approaches, such as MinHash, have predominantly centered around…

Data Structures and Algorithms · Computer Science 2024-05-31 Fenghao Dong , Yang He , Yutong Liang , Zirui Liu , Yuhan Wu , Peiqing Chen , Tong Yang

Streaming analytics are essential in a large range of applications, including databases, networking, and machine learning. To optimize performance, practitioners are increasingly offloading such analytics to network nodes such as switches.…

Networking and Internet Architecture · Computer Science 2025-03-19 Jonatan Langlet , Peiqing Chen , Michael Mitzenmacher , Ran Ben Basat , Zaoxing Liu , Gianni Antichi

Sketch-and-solve (SAS) is a very successful method to efficiently estimate the solution of heavily overdetermined large linear least squares problems. It uses random sketching to reduce the size of the problem, hence reducing the…

Numerical Analysis · Mathematics 2026-05-26 Irina-Beatrice Haas , Michael B. Giles , Yuji Nakatsukasa

Large, distributed data streams are now ubiquitous. High-accuracy sketches with low memory overhead have become the de facto method for analyzing this data. For instance, if we wish to group data by some label and report the largest counts…

Data Structures and Algorithms · Computer Science 2024-02-14 Homin K. Lee , Charles Masson

Measuring network flow sizes is important for tasks like accounting/billing, network forensics and security. Per-flow accounting is considered hard because it requires that many counters be updated at a very high speed; however, the large…

Networking and Internet Architecture · Computer Science 2007-10-07 Yi Lu , Andrea Montanari , Balaji Prabhakar

MinHash and HyperLogLog are sketching algorithms that have become indispensable for set summaries in big data applications. While HyperLogLog allows counting different elements with very little space, MinHash is suitable for the fast…

Data Structures and Algorithms · Computer Science 2021-08-12 Otmar Ertl

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show…

Data Structures and Algorithms · Computer Science 2024-05-10 Sachin Garg , Kevin Tan , Michał Dereziński

Demands are increasing to measure per-flow statistics in the data plane of high-speed switches. Measuring flows with exact counting is infeasible due to processing and memory constraints, but a sketch is a promising candidate for collecting…

Networking and Internet Architecture · Computer Science 2021-11-05 SunYoung Kim , Changhun Jung , RhongHo Jang , David Mohaisen , DaeHun Nyang

We propose OverSketch, an approximate algorithm for distributed matrix multiplication in serverless computing. OverSketch leverages ideas from matrix sketching and high-performance computing to enable cost-efficient multiplication that is…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-25 Vipul Gupta , Shusen Wang , Thomas Courtade , Kannan Ramchandran

Graph streams are rapidly evolving sequences of edges that convey continuously changing relationships among entities, playing a crucial role in domains such as networking, finance, and cybersecurity. Their massive scale and high dynamism…

Databases · Computer Science 2026-02-18 Boyan Wang , Zhuochen Fan , Dayu Wang , Fangcheng Fu , Zeyu Luan , Lei Zou , Qing Li , Tong Yang

Data sketches are approximate succinct summaries of long streams. They are widely used for processing massive amounts of data and answering statistical queries about it in real-time. Existing libraries producing sketches are very fast, but…

Data Structures and Algorithms · Computer Science 2019-12-06 Arik Rinberg , Alexander Spiegelman , Edward Bortnikov , Eshcar Hillel , Idit Keidar , Lee Rhodes , Hadar Serviansky

Count-Min Sketch is a widely adopted algorithm for approximate event counting in large scale processing. However, the original version of the Count-Min-Sketch (CMS) suffers of some deficiences, especially if one is interested by the…

Information Retrieval · Computer Science 2015-02-18 Guillaume Pitel , Geoffroy Fouquier

Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can…

Machine Learning · Computer Science 2018-09-17 Ashkan Norouzi-Fard , Jakub Tarnawski , Slobodan Mitrović , Amir Zandieh , Aida Mousavifar , Ola Svensson

This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a…

Data Structures and Algorithms · Computer Science 2015-02-11 David P. Woodruff

Sketching is a dimensionality reduction technique where one compresses a matrix by linear combinations that are chosen at random. A line of work has shown how to sketch the Hessian to speed up each iteration in a second order method, but…

Machine Learning · Computer Science 2021-10-07 Yi Li , Honghao Lin , David P. Woodruff

Identifying independence between two random variables or correlated given their samples has been a fundamental problem in Statistics. However, how to do so in a space-efficient way if the number of states is large is not quite well-studied.…

Data Structures and Algorithms · Computer Science 2022-11-21 Zhenhao Gu , Hao Zhang
‹ Prev 1 2 3 10 Next ›