Related papers: Sliding Window Algorithms for k-Clustering Problem…
Streaming computation plays an important role in large-scale data analysis. The sliding window model is a model of streaming computation which also captures the recency of the data. In this model, data arrives one item at a time, but only…
Metric $k$-center clustering is a fundamental unsupervised learning primitive. Although widely used, this primitive is heavily affected by noise in the data, so that a more sensible variant seeks for the best solution that disregards a…
The $k$-center problem requires the selection of $k$ points (centers) from a given metric pointset $W$ so to minimize the maximum distance of any point of $W$ from the closest center. This paper focuses on a fair variant of the problem,…
Clustering is an important technique for identifying structural information in large-scale data analysis, where the underlying dataset may be too large to store. In many applications, recent data can provide more accurate information and…
The $k$-center problem for a point set~$P$ asks for a collection of $k$ congruent balls (that is, balls of equal radius) that together cover all the points in $P$ and whose radius is minimized. The $k$-center problem with outliers is…
Maximizing submodular functions under cardinality constraints lies at the core of numerous data mining and machine learning applications, including data diversification, data summarization, and coverage problems. In this work, we study this…
Sliding-window aggregation is a foundational stream processing primitive that efficiently summarizes recent data. The state-of-the-art algorithms for sliding-window aggregation are highly efficient when stream data items are evicted or…
We show how to utilize machine learning approaches to improve sliding window algorithms for approximate frequency estimation problems, under the ``algorithms with predictions'' framework. In this dynamic environment, previous…
Duplicate detection is the problem of identifying whether a given item has previously appeared in a (possibly infinite) stream of data, when only a limited amount of memory is available. Unfortunately the infinite stream setting is…
Interactive high-performance computing is doubtlessly beneficial for many computational science and engineering applications whenever simulation results should be visually processed in real time, i.e. during the computation process.…
Clustering is one of the most fundamental tools in data science and machine learning, and k-means clustering is one of the most common such methods. There is a variety of approximate algorithms for the k-means problem, but computing the…
Clustering algorithms remain valuable tools for grouping and summarizing the most important aspects of data. Example areas where this is the case include image segmentation, dimension reduction, signals analysis, model order reduction,…
We explore clustering problems in the streaming sliding window model in both general metric spaces and Euclidean space. We present the first polylogarithmic space $O(1)$-approximation to the metric $k$-median and metric $k$-means problems…
The past decade has witnessed many interesting algorithms for maintaining statistics over a data stream. This paper initiates a theoretical study of algorithms for monitoring distributed data streams over a time-based sliding window (which…
The $k$-center problem is a fundamental clustering variant with applications in learning systems and data summarization. In several real-world scenarios, the dataset to be clustered is not static, but evolves over time, as new data points…
In the sliding window model, we are required to maintain the target statistics over the most recent $n$ elements of a data stream, which is captured by a window of size $n$ sliding over the data stream. Exact computation usually requires…
Recursive estimates of large systems of equations in the context of least squares fitting is a common practice in different fields of study. For example, recursive adaptive filtering is extensively used in signal processing and control…
We study streaming algorithms for proportionally fair clustering, a notion originally suggested by Chierichetti et. al. (2017), in the sliding window model. We show that although there exist efficient streaming algorithms in the…
A Bloom filter is a method for reducing the space (memory) required for representing a set by allowing a small error probability. In this paper we consider a \emph{Sliding Bloom Filter}: a data structure that, given a stream of elements,…
The proliferation of sensing and monitoring applications motivates adoption of the event stream model of computation. Though sliding windows are widely used to facilitate effective event stream processing, it is greatly challenged when the…