Related papers: Binary Coding in Stream
In this paper, we address the problem of learning compact similarity-preserving embeddings for massive high-dimensional streams of data in order to perform efficient similarity search. We present a new online method for computing binary…
Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a…
Recent advancement of the WWW, IOT, social network, e-commerce, etc. have generated a large volume of data. These datasets are mostly represented by high dimensional and sparse datasets. Many fundamental subroutines of common data analytic…
We initiate the study of biological neural networks from the perspective of streaming algorithms. Like computers, human brains suffer from memory limitations which pose a significant obstacle when processing large scale and dynamically…
We study the relation between streaming algorithms and linear sketching algorithms, in the context of binary updates. We show that for inputs in $n$ dimensions, the existence of efficient streaming algorithms which can process $\Omega(n^2)$…
Low-rank approximation in data streams is a fundamental and significant task in computing science, machine learning and statistics. Multiple streaming algorithms have emerged over years and most of them are inspired by randomized…
This is a position paper, submitted to the Future Online Analysis Platform Workshop (https://press3.mcs.anl.gov/futureplatform/), which argues that simple data analysis applications are common today, but future online supercomputing…
The $k$-core decomposition is a fundamental primitive in many machine learning and data mining applications. We present the first distributed and the first streaming algorithms to compute and maintain an approximate $k$-core decomposition…
We propose a streaming algorithm for the binary classification of data based on crowdsourcing. The algorithm learns the competence of each labeller by comparing her labels to those of other labellers on the same tasks and uses this…
We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives the rows of a large matrix $A \in \R^{n \times m}$ one after the other in a streaming fashion. It maintains…
There is growing interest in representing image data and feature descriptors using compact binary codes for fast near neighbor search. Although binary codes are motivated by their use as direct indices (addresses) into a hash table, codes…
Problems involving the efficient arrangement of simple objects, as captured by bin packing and makespan scheduling, are fundamental tasks in combinatorial optimization. These are well understood in the traditional online and offline cases,…
Many well-known, real-world problems involve dynamic data which describe the relationship among the entities. Hypergraphs are powerful combinatorial structures that are frequently used to model such data. For many of today's data-centric…
Real-time streaming communication requires a high quality of service despite contending with packet loss. Streaming codes are a class of codes best suited for this setting. A key challenge for streaming codes is that they operate in an…
In many emerging applications, data streams are monitored in a network environment. Due to limited communication bandwidth and other resource constraints, a critical and practical demand is to online compress data streams continuously with…
In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process…
We initiate the study of sub-linear sketching and streaming techniques for estimating the output size of common dictionary compressors such as Lempel-Ziv '77, the run-length Burrows-Wheeler transform, and grammar compression. To this end,…
The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software…
Online hashing methods are efficient in learning the hash functions from the streaming data. However, when the hash functions change, the binary codes for the database have to be recomputed to guarantee the retrieval accuracy. Recomputing…
We introduce and study the problem of computing the similarity self-join in a streaming context (SSSJ), where the input is an unbounded stream of items arriving continuously. The goal is to find all pairs of items in the stream whose…