Related papers: A statistical analysis of probabilistic counting a…
Cardinality estimation algorithms receive a stream of elements whose order might be arbitrary, with possible repetitions, and return the number of distinct elements. Such algorithms usually seek to minimize the required storage and…
Cardinality estimation algorithms receive a stream of elements, with possible repetitions, and return the number of distinct elements in the stream. Such algorithms seek to minimize the required memory and CPU resource consumption at the…
This paper presents new methods to estimate the cardinalities of data sets recorded by HyperLogLog sketches. A theoretically motivated extension to the original estimator is presented that eliminates the bias for small and large…
Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality…
We derive a stochastic gradient algorithm for semidefinite optimization using randomization techniques. The algorithm uses subsampling to reduce the computational cost of each iteration and the subsampling ratio explicitly controls…
In recent years there has been a growing interest in developing "streaming algorithms" for efficient processing and querying of continuous data streams. These algorithms seek to provide accurate results while minimizing the required storage…
Randomized algorithms, such as randomized sketching or stochastic optimization, are a promising approach to ease the computational burden in analyzing large datasets. However, randomized algorithms also produce non-deterministic outputs,…
Online monitoring user cardinalities (or degrees) in graph streams is fundamental for many applications. For example in a bipartite graph representing user-website visiting activities, user cardinalities (the number of distinct visited…
Sketch-based streaming algorithms allow efficient processing of big data. These algorithms use small fixed-size storage to store a summary ("sketch") of the input data, and use probabilistic algorithms to estimate the desired quantity.…
The ability to preserve user privacy and anonymity is important. One of the safest ways to maintain privacy is to avoid storing personally identifiable information (PII), which poses a challenge for maintaining useful user statistics.…
Many streaming algorithms provide only a high-probability relative approximation. These two relaxations, of allowing approximation and randomization, seem necessary -- for many streaming problems, both relaxations must be employed…
We study two classes of summary-based cardinality estimators that use statistics about input relations and small-size joins in the context of graph database management systems: (i) optimistic estimators that make uniformity and conditional…
The amount of data coming from different sources such as IoT-sensors, social networks, cellular networks, has increased exponentially during the last few years. Probabilistic Data Structures (PDS) are efficient alternatives to deterministic…
Flow cardinality estimation is the problem of estimating the number of distinct elements in a data flow, often with a stringent memory constraint. It has wide applications in network traffic measurement and in database systems. The virtual…
Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element…
In this paper we consider the problem of maximizing a non-negative submodular function subject to a cardinality constraint in the data stream model. Previously, the best known algorithm for this problem was a $5.828$-approximation…
Sketching is a probabilistic data compression technique that has been largely developed in the computer science community. Numerical operations on big datasets can be intolerably slow; sketching algorithms address this issue by generating a…
We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such…
Sketching algorithms use random projections to generate a smaller sketched data set, often for the purposes of modelling. Complete and partial sketch regression estimates can be constructed using information from only the sketched data set…
We consider the problem of monotone, submodular maximization over a ground set of size $n$ subject to cardinality constraint $k$. For this problem, we introduce the first deterministic algorithms with linear time complexity; these…