Related papers: Double-Hashing Algorithm for Frequency Estimation …

Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme

We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…

Data Structures and Algorithms · Computer Science 2022-07-19 Dimitris Bertsimas , Vassilis Digalakis

Improved Frequency Estimation Algorithms with and without Predictions

Estimating frequencies of elements appearing in a data stream is a key task in large-scale data analysis. Popular sketching approaches to this problem (e.g., CountMin and CountSketch) come with worst-case guarantees that probabilistically…

Data Structures and Algorithms · Computer Science 2023-12-13 Anders Aamand , Justin Y. Chen , Huy Lê Nguyen , Sandeep Silwal , Ali Vakilian

Hierarchical Heavy Hitters with the Space Saving Algorithm

The Hierarchical Heavy Hitters problem extends the notion of frequent items to data arranged in a hierarchy. This problem has applications to network traffic monitoring, anomaly detection, and DDoS detection. We present a new streaming…

Data Structures and Algorithms · Computer Science 2011-08-10 Michael Mitzenmacher , Thomas Steinke , Justin Thaler

Hashing Pursuit for Online Identification of Heavy-Hitters in High-Speed Network Streams

Distributed Denial of Service (DDoS) attacks have become more prominent recently, both in frequency of occurrence, as well as magnitude. Such attacks render key Internet resources unavailable and disrupt its normal operation. It is…

Cryptography and Security · Computer Science 2014-12-22 Michael Kallitsis , Stilian Stoev , George Michailidis

Identifying Correlated Heavy-Hitters in a Two-Dimensional Data Stream

We consider online mining of correlated heavy-hitters from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes…

Databases · Computer Science 2013-10-07 Bibudh Lahiri , Arko Provo Mukherjee , Srikanta Tirthapura

Learning-Based Heavy Hitters and Flow Frequency Estimation in Streams

Identifying heavy hitters and estimating the frequencies of flows are fundamental tasks in various network domains. Existing approaches to this challenge can broadly be categorized into two groups, hashing-based and competing-counter-based.…

Data Structures and Algorithms · Computer Science 2024-06-25 Rana Shahout , Michael Mitzenmacher

A High-Performance Algorithm for Identifying Frequent Items in Data Streams

Estimating frequencies of items over data streams is a common building block in streaming data measurement and analysis. Misra and Gries introduced their seminal algorithm for the problem in 1982, and the problem has since been revisited…

Data Structures and Algorithms · Computer Science 2017-05-23 Daniel Anderson , Pryce Bevan , Kevin Lang , Edo Liberty , Lee Rhodes , Justin Thaler

Cuckoo Heavy Keeper and the balancing act of maintaining heavy hitters in stream processing

Finding heavy hitters in databases and data streams is a fundamental problem with applications ranging from network monitoring to database query optimization, machine learning, and more. Approximation algorithms offer practical solutions,…

Data Structures and Algorithms · Computer Science 2025-11-24 Vinh Quang Ngo , Marina Papatriantafilou

Efficient Distinct Heavy Hitters for DNS DDoS Attack Detection

Motivated by a recent new type of randomized Distributed Denial of Service (DDoS) attacks on the Domain Name Service (DNS), we develop novel and efficient distinct heavy hitters algorithms and build an attack identification system that uses…

Cryptography and Security · Computer Science 2016-12-09 Yehuda Afek , Anat Bremler-Barr , Edith Cohen , Shir Landau Feibish , Michal Shagam

2FA Sketch: Two-Factor Armor Sketch for Accurate and Efficient Heavy Hitter Detection in Data Streams

Detecting heavy hitters, which are flows exceeding a specified threshold, is crucial for network measurement, but it faces challenges due to increasing throughput and memory constraints. Existing sketch-based solutions, particularly those…

Networking and Internet Architecture · Computer Science 2024-08-26 Xilai Liu , Xinyi Zhang , Bingqing Liu , Tao Li , Tong Yang , Gaogang Xie

A new Frequency Estimation Sketch for Data Streams

In data stream applications, one of the critical issues is to estimate the frequency of each item in the specific multiset. The multiset means that each item in this set can appear multiple times. The data streams in many applications are…

Data Structures and Algorithms · Computer Science 2020-01-07 Ning Li

Frequency Estimation with One-Sided Error

Frequency estimation is one of the most fundamental problems in streaming algorithms. Given a stream $S$ of elements from some universe $U=\{1 \ldots n\}$, the goal is to compute, in a single pass, a short sketch of $S$ so that for any…

Data Structures and Algorithms · Computer Science 2021-11-09 Piotr Indyk , Shyam Narayanan , David P. Woodruff

Streaming Algorithms for Pattern Discovery over Dynamically Changing Event Sequences

Discovering frequent episodes over event sequences is an important data mining task. In many applications, events constituting the data sequence arrive as a stream, at furious rates, and recent trends (or frequent episodes) can change and…

Machine Learning · Computer Science 2012-05-22 Debprakash Patnaik , Naren Ramakrishnan , Srivatsan Laxman , Badrish Chandramouli

When Two Choices Are not Enough: Balancing at Scale in Distributed Stream Processing

Carefully balancing load in distributed stream processing systems has a fundamental impact on execution latency and throughput. Load balancing is challenging because real-world workloads are skewed: some tuples in the stream are associated…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-01-28 Muhammad Anis Uddin Nasir , Gianmarco De Francisci Morales , Nicolas Kourtellis , Marco Serafini

No Repetition: Fast Streaming with Highly Concentrated Hashing

To get estimators that work within a certain error bound with high probability, a common strategy is to design one that works with constant probability, and then boost the probability using independent repetitions. Important examples of…

Data Structures and Algorithms · Computer Science 2020-04-03 Anders Aamand , Debarati Das , Evangelos Kipouridis , Jakob B. T. Knudsen , Peter M. R. Rasmussen , Mikkel Thorup

SimiSketch: Efficiently Estimating Similarity of streaming Multisets

The challenge of estimating similarity between sets has been a significant concern in data science, finding diverse applications across various domains. However, previous approaches, such as MinHash, have predominantly centered around…

Data Structures and Algorithms · Computer Science 2024-05-31 Fenghao Dong , Yang He , Yutong Liang , Zirui Liu , Yuhan Wu , Peiqing Chen , Tong Yang

Stream Clustering using Probabilistic Data Structures

Most density based stream clustering algorithms separate the clustering process into an online and offline component. Exact summarized statistics are being employed for defining micro-clusters or grid cells during the online stage followed…

Databases · Computer Science 2016-12-09 Andrei Sorin Sabau

Efficient Algorithm for Deterministic Search of Hot Elements

When facing a very large stream of data, it is often desirable to extract most important statistics online in a short time and using small memory. For example, one may want to quickly find the most influential users generating posts online…

Data Structures and Algorithms · Computer Science 2022-03-30 Dariusz R. Kowalski , Dominik Pajak

SQUID: Faster Analytics via Sampled Quantile Estimation

Streaming algorithms are fundamental in the analysis of large and online datasets. A key component of many such analytic tasks is $q$-MAX, which finds the largest $q$ values in a number stream. Modern approaches attain a constant runtime by…

Data Structures and Algorithms · Computer Science 2024-07-11 Ran Ben-Basat , Gil Einziger , Wenchen Han , Bilal Tayh

Pattern Recognition and Event Detection on IoT Data-streams

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream…

Machine Learning · Computer Science 2022-03-03 Christos Karras , Aristeidis Karras , Spyros Sioutas