English
Related papers

Related papers: K-means for Evolving Data Streams

200 papers

We consider the classic Euclidean $k$-median and $k$-means objective on data streams, where the goal is to provide a $(1+\varepsilon)$-approximation to the optimal $k$-median or $k$-means solution, while using as little memory as possible.…

Data Structures and Algorithms · Computer Science 2023-10-05 Vincent Cohen-Addad , David P. Woodruff , Samson Zhou

In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process…

Data Structures and Algorithms · Computer Science 2015-03-20 Vaneet Aggarwal , Shankar Krishnan

Many real-world applications pose challenges in incorporating fairness constraints into the $k$-center clustering problem, where the dataset consists of $m$ demographic groups, each with a specified upper bound on the number of centers to…

Data Structures and Algorithms · Computer Science 2026-01-19 Longkun Guo , Zeyu Lin , Chaoqi Jia , Chao Chen

Many real-world data stream applications not only suffer from concept drift but also class imbalance. Yet, very few existing studies investigated this joint challenge. Data difficulty factors, which have been shown to be key challenges in…

Machine Learning · Computer Science 2023-08-30 Chun Wai Chiu , Leandro L. Minku

We provide the first streaming algorithm for computing a provable approximation to the $k$-means of sparse Big data. Here, sparse Big Data is a set of $n$ vectors in $\mathbb{R}^d$, where each vector has $O(1)$ non-zeroes entries, and…

Data Structures and Algorithms · Computer Science 2016-02-09 Artem Barger , Dan Feldman

We present a new streaming algorithm for the $k$-Mismatch problem, one of the most basic problems in pattern matching. Given a pattern and a text, the task is to find all substrings of the text that are at the Hamming distance at most $k$…

Data Structures and Algorithms · Computer Science 2019-04-24 Jakub Radoszewski , Tatiana Starikovskaya

We present methods for k-means clustering on a stream with a focus on providing fast responses to clustering queries. Compared to the current state-of-the-art, our methods provide substantial improvement in the query time for cluster…

Data Structures and Algorithms · Computer Science 2018-12-10 Yu Zhang , Kanat Tangwongsan , Srikanta Tirthapura

Clustering of data points in metric space is among the most fundamental problems in computer science with plenty of applications in data mining, information retrieval and machine learning. Due to the necessity of clustering of large…

Data Structures and Algorithms · Computer Science 2019-10-03 Hossein Esfandiari , Vahab Mirrokni , Peilin Zhong

Besides the classical offline setup of machine learning, stream learning constitutes a well-established setup where data arrives over time in potentially non-stationary environments. Concept drift, the phenomenon that the underlying…

Machine Learning · Computer Science 2024-12-13 Fabian Hinder , Valerie Vaquet , David Komnick , Barbara Hammer

The distribution of streaming data often changes over time as conditions change, a phenomenon known as concept drift. Only a subset of previous experience, collected in similar conditions, is relevant to learning an accurate classifier for…

Machine Learning · Computer Science 2024-08-20 Ben Halstead , Yun Sing Koh , Patricia Riddle , Mykola Pechenizkiy , Albert Bifet

The literature on machine learning in the context of data streams is vast and growing. However, many of the defining assumptions regarding data-stream learning tasks are too strong to hold in practice, or are even contradictory such that…

Machine Learning · Computer Science 2025-09-09 Jesse Read , Indrė Žliobaitė

Given a stream of entries over time in a multi-dimensional data setting where concept drift is present, how can we detect anomalous activities? Most of the existing unsupervised anomaly detection approaches seek to detect anomalous events…

Machine Learning · Computer Science 2022-03-07 Siddharth Bhatia , Arjit Jain , Shivin Srivastava , Kenji Kawaguchi , Bryan Hooi

We introduce a streaming framework for analyzing stochastic approximation/optimization problems. This streaming framework is analogous to solving optimization problems using time-varying mini-batches that arrive sequentially. We provide…

Machine Learning · Computer Science 2023-04-25 Antoine Godichon-Baggioni , Nicklas Werge , Olivier Wintenberger

Stimulated by practical applications arising from viral marketing. This paper investigates a novel Budgeted $k$-Submodular Maximization problem defined as follows: Given a finite set $V$, a budget $B$ and a $k$-submodular function $f:…

Data Structures and Algorithms · Computer Science 2021-10-25 Canh V. Pham , Quang C. Vu , Dung K. T. Ha , Tai T. Nguyen

Data streams are often defined as large amounts of data flowing continuously at high speed. Moreover, these data are likely subject to changes in data distribution, known as concept drift. Given all the reasons mentioned above, learning…

Common clustering algorithms require multiple scans of all the data to achieve convergence, and this is prohibitive when large databases, with data arriving in streams, must be processed. Some algorithms to extend the popular K-means method…

Applications · Statistics 2017-12-22 Giacomo Aletti , Alessandra Micheletti

Big data streams are possibly one of the most essential underlying notions. However, data streams are often challenging to handle owing to their rapid pace and limited information lifetime. It is difficult to collect and communicate stream…

Machine Learning · Computer Science 2022-03-03 Christos Karras , Aristeidis Karras , Spyros Sioutas

We prove asymptotic convergence for a general class of $k$-means algorithms performed over streaming data from a distribution: the centers asymptotically converge to the set of stationary points of the $k$-means cost function. To do so, we…

Machine Learning · Computer Science 2022-02-23 Sanjoy Dasgupta , Gaurav Mahajan , Geelon So

Many optimization tasks involve streaming data with unknown concept drifts, posing a significant challenge as Streaming Data-Driven Optimization (SDDO). Existing methods, while leveraging surrogate model approximation and historical…

Machine Learning · Computer Science 2025-12-09 Yuan-Ting Zhong , Ting Huang , Xiaolin Xiao , Yue-Jiao Gong

We generalise the results of Bhattacharya et al. (Journal of Computing Systems, 62(1):93-115, 2018) for the list-$k$-means problem defined as -- for a (unknown) partition $X_1, ..., X_k$ of the dataset $X \subseteq \mathbb{R}^d$, find a…

Data Structures and Algorithms · Computer Science 2020-02-20 Dishant Goyal , Ragesh Jaiswal , Amit Kumar
‹ Prev 1 2 3 10 Next ›