Related papers: Stream Sampling for Frequency Cap Statistics

Impact of Sampling on Locally Differentially Private Data Collection

With the recent bloom of data, there is a huge surge in threats against individuals' private information. Various techniques for optimizing privacy-preserving data analysis are at the focus of research in the recent years. In this paper, we…

Cryptography and Security · Computer Science 2022-11-11 Sayan Biswas , Graham Cormode , Carsten Maple

Sampling Sketches for Concave Sublinear Functions of Frequencies

We consider massive distributed datasets that consist of elements modeled as key-value pairs and the task of computing statistics or aggregates where the contribution of each key is weighted by a function of its frequency (sum of values of…

Data Structures and Algorithms · Computer Science 2019-12-24 Edith Cohen , Ofir Geri

Fair and Differentially Private Distributed Frequency Estimation

In order to remain competitive, Internet companies collect and analyse user data for the purpose of improving user experiences. Frequency estimation is a widely used statistical tool which could potentially conflict with the relevant…

Cryptography and Security · Computer Science 2021-04-14 Mengmeng Yang , Ivan Tjuawinata , Kwok-Yan Lam , Tianqing Zhu , Jun Zhao

A new Frequency Estimation Sketch for Data Streams

In data stream applications, one of the critical issues is to estimate the frequency of each item in the specific multiset. The multiset means that each item in this set can appear multiple times. The data streams in many applications are…

Data Structures and Algorithms · Computer Science 2020-01-07 Ning Li

Overview of streaming-data algorithms

Due to recent advances in data collection techniques, massive amounts of data are being collected at an extremely fast pace. Also, these data are potentially unbounded. Boundless streams of data collected from sensors, equipments, and other…

Databases · Computer Science 2012-03-12 T Soni Madhulatha

Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme

We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random…

Data Structures and Algorithms · Computer Science 2022-07-19 Dimitris Bertsimas , Vassilis Digalakis

Sparse Uncertainty-Informed Sampling from Federated Streaming Data

We present a numerically robust, computationally efficient approach for non-I.I.D. data stream sampling in federated client systems, where resources are limited and labeled data for local model adaptation is sparse and expensive. The…

Machine Learning · Computer Science 2024-09-02 Manuel Röder , Frank-Michael Schleif

Data Sketches for Disaggregated Subset Sum and Frequent Item Estimation

We introduce and study a new data sketch for processing massive datasets. It addresses two common problems: 1) computing a sum given arbitrary filter conditions and 2) identifying the frequent items or heavy hitters in a data set. For the…

Computation · Statistics 2017-09-14 Daniel Ting

Differentially Private Weighted Sampling

Common datasets have the form of elements with keys (e.g., transactions and products) and the goal is to perform analytics on the aggregated form of key and frequency pairs. A weighted sample of keys by (a function of) frequency is a highly…

Machine Learning · Computer Science 2021-04-01 Edith Cohen , Ofir Geri , Tamas Sarlos , Uri Stemmer

Frequency Estimation Under Multiparty Differential Privacy: One-shot and Streaming

We study the fundamental problem of frequency estimation under both privacy and communication constraints, where the data is distributed among $k$ parties. We consider two application scenarios: (1) one-shot, where the data is static and…

Cryptography and Security · Computer Science 2021-06-01 Ziyue Huang , Yuan Qiu , Ke Yi , Graham Cormode

Sampling Online Social Networks via Heterogeneous Statistics

Most sampling techniques for online social networks (OSNs) are based on a particular sampling method on a single graph, which is referred to as a statistics. However, various realizing methods on different graphs could possibly be used in…

Social and Information Networks · Computer Science 2015-12-21 Xin Wang , Richard T. B. Ma , Yinlong Xu , Zhipeng Li

Pattern Recognition and Event Detection on IoT Data-streams

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream…

Machine Learning · Computer Science 2022-03-03 Christos Karras , Aristeidis Karras , Spyros Sioutas

ComPAS: Community Preserving Sampling for Streaming Graphs

In the era of big data, graph sampling is indispensable in many settings. Existing sampling methods are mostly designed for static graphs, and aim to preserve basic structural properties of the original graph (such as degree distribution,…

Social and Information Networks · Computer Science 2018-02-07 Sandipan Sikdar , Tanmoy Chakraborty , Soumya Sarkar , Niloy Ganguly , Animesh Mukherjee

Feasible Sampling of Non-strict Turnstile Data Streams

We present the first feasible method for sampling a dynamic data stream with deletions, where the sample consists of pairs $(k,C_k)$ of a value $k$ and its exact total count $C_k$. Our algorithms are for both Strict Turnstile data streams…

Data Structures and Algorithms · Computer Science 2012-09-26 Neta Barkay , Ely Porat , Bar Shalem

Sampling Large Data on Graphs

We consider the problem of sampling from data defined on the nodes of a weighted graph, where the edge weights capture the data correlation structure. As shown recently, using spectral graph theory one can define a cut-off frequency for the…

Information Theory · Computer Science 2014-11-13 Ilan Shomorony , A. Salman Avestimehr

Graph Sample and Hold: A Framework for Big-Graph Analytics

Sampling is a standard approach in big-graph analytics; the goal is to efficiently estimate the graph properties by consulting a sample of the whole population. A perfect sample is assumed to mirror every property of the whole population.…

Social and Information Networks · Computer Science 2014-03-18 Nesreen K. Ahmed , Nick Duffield , Jennifer Neville , Ramana Kompella

HyperLogLog Hyper Extended: Sketches for Concave Sublinear Frequency Statistics

One of the most common statistics computed over data elements is the number of distinct keys. A thread of research pioneered by Flajolet and Martin three decades ago culminated in the design of optimal approximate counting sketches, which…

Data Structures and Algorithms · Computer Science 2017-02-27 Edith Cohen

Sketch Disaggregation Across Time and Space

Streaming analytics are essential in a large range of applications, including databases, networking, and machine learning. To optimize performance, practitioners are increasingly offloading such analytics to network nodes such as switches.…

Networking and Internet Architecture · Computer Science 2025-03-19 Jonatan Langlet , Peiqing Chen , Michael Mitzenmacher , Ran Ben Basat , Zaoxing Liu , Gianni Antichi

Universal Streaming

Given a stream of data, a typical approach in streaming algorithms is to design a sophisticated algorithm with small memory that computes a specific statistic over the streaming data. Usually, if one wants to compute a different statistic…

Data Structures and Algorithms · Computer Science 2014-08-13 Vladimir Braverman , Rafail Ostrovsky , Alan Roytman

ONCE and ONCE+: Counting the Frequency of Time-constrained Serial Episodes in a Streaming Sequence

As a representative sequential pattern mining problem, counting the frequency of serial episodes from a streaming sequence has drawn continuous attention in academia due to its wide application in practice, e.g., telecommunication alarms,…

Data Structures and Algorithms · Computer Science 2018-01-30 Hui Li , Sizhe Peng , Jian Li , Jingjing Li , Jiangtao Cui , Jianfeng Ma