Related papers: Approximate Query Processing over Static Sets and …
Streaming computation plays an important role in large-scale data analysis. The sliding window model is a model of streaming computation which also captures the recency of the data. In this model, data arrives one item at a time, but only…
We consider the problem of summarizing a multi set of elements in $\{1, 2, \ldots , n\}$ under the constraint that no element appears more than $\ell$ times. The goal is then to answer \emph{rank} queries --- given $i\in\{1, 2, \ldots ,…
We study algorithms for the sliding-window model, an important variant of the data-stream model, in which the goal is to compute some function of a fixed-length suffix of the stream. We extend the smooth-histogram framework of Braverman and…
Neural summarization models suffer from the fixed-size input limitation: if text length surpasses the model's maximal number of input tokens, some document content (possibly summary-relevant) gets truncated Independently summarizing windows…
The sliding window model of computation captures scenarios in which data is arriving continuously, but only the latest $w$ elements should be used for analysis. The goal is to design algorithms that update the solution efficiently with each…
In the sliding window model, we are required to maintain the target statistics over the most recent $n$ elements of a data stream, which is captured by a window of size $n$ sliding over the data stream. Exact computation usually requires…
Windowed recurrences are sliding window calculations where a function is applied iteratively across the window of data, and are ubiquitous throughout the natural, social, and computational sciences. In this monograph we explore the…
Sliding window sums are widely used in bioinformatics applications, including sequence assembly, k-mer generation, hashing and compression. New vector algorithms which utilize the advanced vector extension (AVX) instructions available on…
We show how to utilize machine learning approaches to improve sliding window algorithms for approximate frequency estimation problems, under the ``algorithms with predictions'' framework. In this dynamic environment, previous…
Uncertainty arises naturally inmany application domains due to, e.g., data entry errors and ambiguity in data cleaning. Prior work in incomplete and probabilistic databases has investigated the semantics and efficient evaluation of ranking…
This paper considers the problem of maintaining statistic aggregates over the last W elements of a data stream. First, the problem of counting the number of 1's in the last W bits of a binary stream is considered. A lower bound of…
We study index-based processing for connectivity queries within sliding windows on streaming graphs. These queries, which determine whether two vertices belong to the same connected component, are fundamental operations in real-time graph…
Matrix multiplication is a core operation in numerous applications, yet its exact computation becomes prohibitively expensive as data scales, especially in streaming environments where timeliness is critical. In many real-world scenarios,…
Massive sizes of real-world graphs, such as social networks and web graph, impose serious challenges to process and perform analytics on them. These issues can be resolved by working on a small summary of the graph instead . A summary is a…
Many big-data clusters store data in large partitions that support access at a coarse, partition-level granularity. As a result, approximate query processing via row-level sampling is inefficient, often requiring reads of many partitions.…
Maximizing submodular functions under cardinality constraints lies at the core of numerous data mining and machine learning applications, including data diversification, data summarization, and coverage problems. In this work, we study this…
Probabilistic graphical models are a key tool in machine learning applications. Computing the partition function, i.e., normalizing constant, is a fundamental task of statistical inference but it is generally computationally intractable,…
As data volume grows extensively, data profiling helps to extract metadata of large-scale data. However, one kind of metadata, order statistics, is difficult to be computed because they are not mergeable or incremental. Thus, the limitation…
The subset sum problem is known to be an NP-hard problem in the field of computer science with the fastest known approach having a run-time complexity of $O(2^{0.3113n})$. A modified version of this problem is known as the perfect sum…
This paper considers an optimization problem for a dynamical system whose evolution depends on a collection of binary decision variables. We develop scalable approximation algorithms with provable suboptimality bounds to provide…