Related papers: Classification dynamique d'un flux documentaire : …
We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the…
The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches…
We present a structural clustering algorithm for large-scale datasets of small labeled graphs, utilizing a frequent subgraph sampling strategy. A set of representatives provides an intuitive description of each cluster, supports the…
Most density based stream clustering algorithms separate the clustering process into an online and offline component. Exact summarized statistics are being employed for defining micro-clusters or grid cells during the online stage followed…
The problem of analyzing data streams of very large volumes is important and is very desirable for many application domains. In this paper we present and demonstrate effective working of an algorithm to find clusters and anomalous data…
The analysis of data streams has received considerable attention over the past few decades due to sensors, social media, etc. It aims to recognize patterns in an unordered, infinite, and evolving stream of observations. Clustering this type…
The data stream model has been defined for new classes of applications involving massive data being generated at a fast pace. Web click stream analysis and detection of network intrusions are two examples. Cluster analysis on data streams…
Most classification methods are based on the assumption that data conforms to a stationary distribution. The machine learning domain currently suffers from a lack of classification techniques that are able to detect the occurrence of a…
Clustering techniques are very attractive for extracting and identifying patterns in datasets. However, their application to very large spatial datasets presents numerous challenges such as high-dimensionality data, heterogeneity, and high…
We formulate a novel technique for the detection of functional clusters in discrete event data. The advantage of this algorithm is that no prior knowledge of the number of functional groups is needed, as our procedure progressively combines…
Fast and high quality document clustering is an important task in organizing information, search engine results obtaining from user query, enhancing web crawling and information retrieval. With the large amount of data available and with a…
Due to recent advances in data collection techniques, massive amounts of data are being collected at an extremely fast pace. Also, these data are potentially unbounded. Boundless streams of data collected from sensors, equipments, and other…
Clustering is a fundamental tool for analyzing large data sets. A rich body of work has been devoted to designing data-stream algorithms for the relevant optimization problems such as $k$-center, $k$-median, and $k$-means. Such algorithms…
In this thesis, we propose several modelling strategies to tackle evolving data in different contexts. In the framework of static clustering, we start by introducing a soft kernel spectral clustering (SKSC) algorithm, which can better deal…
In the fields of big data, AI, and streaming processing, we work with large amounts of data from multiple sources. Due to memory and network limitations, we process data streams on distributed systems to alleviate computational and network…
Graph partitioning plays a vital role in distributedlarge-scale web graph analytics, such as pagerank and labelpropagation. The quality and scalability of partitioning strategyhave a strong impact on such communication- and…
Number of connected devices is steadily increasing and these devices continuously generate data streams. Real-time processing of data streams is arousing interest despite many challenges. Clustering is one of the most suitable methods for…
Graph clustering becomes an important problem due to emerging applications involving the web, social networks and bio-informatics. Recently, many such applications generate data in the form of streams. Clustering massive, dynamic graph…
We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…
This paper introduces a new clustering technique, called {\em dimensional clustering}, which clusters each data point by its latent {\em pointwise dimension}, which is a measure of the dimensionality of the data set local to that point.…