Related papers: Data Mining-based Fragmentation of XML Data Wareho…

Fragmenting very large XML data warehouses via K-means clustering algorithm

XML data sources are more and more gaining popularity in the context of a wide family of Business Intelligence (BI) and On-Line Analytical Processing (OLAP) applications, due to the amenities of XML in representing and managing…

Databases · Computer Science 2017-01-10 Alfredo Cuzzocrea , Jérôme Darmont , Hadj Mahboubi

Enhancing XML Data Warehouse Query Performance by Fragmentation

XML data warehouses form an interesting basis for decision-support applications that exploit heterogeneous data from multiple sources. However, XML-native database systems currently suffer from limited performances in terms of manageable…

Databases · Computer Science 2009-08-28 Hadj Mahboubi , Jérôme Darmont

Affinity-based XML Fragmentation

In this paper we tackle the fragmentation problem for highly distributed databases. In such an environment, a suitable fragmentation strategy may provide scalability and availability by minimizing distributed transactions. We propose an…

Databases · Computer Science 2013-04-25 Rebeca Schroeder , Ronaldo Santos Mello , Carmem Satie Hara

An improvement on fragmentation in Distribution Database Design Based on Knowledge-Oriented Clustering Techniques

The problem of optimizing distributed database includes: fragmentation and positioning data. Several different approaches and algorithms have been proposed to solve this problem. In this paper, we propose an algorithm that builds the…

Databases · Computer Science 2015-05-08 Van Nghia Luong , Ha Huy Cuong Nguyen , Van Son Le

Query Performance Optimization in XML Data Warehouses

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways…

Databases · Computer Science 2017-01-30 Hadj Mahboubi , Jérôme Darmont

Prefix-based Labeling Annotation for Effective XML Fragmentation

XML is gradually employed as a standard of data exchange in web environment since its inception in the 90s until present. It serves as a data exchange between systems and other applications. Meanwhile the data volume has grown substantially…

Databases · Computer Science 2015-05-14 Kok-Leong Koong , Su-Cheng Haw , Lay-Ki Soon , Samini Subramaniam

A Join Index for XML Data Warehouses

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways…

Databases · Computer Science 2008-09-12 Hadj Mahboubi , Kamel Aouiche , Jérôme Darmont

An efficient K-means algorithm for Massive Data

Due to the progressive growth of the amount of data available in a wide variety of scientific fields, it has become more difficult to ma- nipulate and analyze such information. Even though datasets have grown in size, the K-means algorithm…

Machine Learning · Statistics 2016-05-11 Marco Capó , Aritz Pérez , José Antonio Lozano

Distributed Clustering Algorithm for Spatial Data Mining

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering…

Databases · Computer Science 2018-02-02 Malika Bendechache , M-Tahar Kechadi

Review on Fragment Allocation by using Clustering Technique in Distributed Database System

Considerable Progress has been made in the last few years in improving the performance of the distributed database systems. The development of Fragment allocation models in Distributed database is becoming difficult due to the complexity of…

Databases · Computer Science 2013-10-07 Priyanka Dash , Ranjita Rout , Satya Bhusan Pratihari , Sanjay Kumar Padhi

A hybrid clustering algorithm for data mining

Data clustering is a process of arranging similar data into groups. A clustering algorithm partitions a data set into several groups such that the similarity within a group is better than among groups. In this paper a hybrid clustering…

Databases · Computer Science 2012-05-25 Ravindra Jain

Reducing Data Fragmentation in Data Deduplication Systems via Partial Repetition and Coding

Data deduplication, one of the key features of modern Big Data storage devices, is the process of removing replicas of data chunks stored by different users. Despite the importance of deduplication, several drawbacks of the method, such as…

Information Theory · Computer Science 2024-11-05 Yun-Han Li , Jin Sima , Ilan Shomorony , Olgica Milenkovic

Dynamically Weighted Federated k-Means

Federated clustering, an integral aspect of federated machine learning, enables multiple data sources to collaboratively cluster their data, maintaining decentralization and preserving privacy. In this paper, we introduce a novel federated…

Machine Learning · Computer Science 2023-11-20 Patrick Holzer , Tania Jacob , Shubham Kavane

Fragmentation in Large Object Repositories

Fragmentation leads to unpredictable and degraded application performance. While these problems have been studied in detail for desktop filesystem workloads, this study examines newer systems such as scalable object stores and multimedia…

Databases · Computer Science 2009-08-21 Russell Sears , Catharine van Ingen

A sampling-based approach for efficient clustering in large datasets

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

An efficient K -means clustering algorithm for massive data

The analysis of continously larger datasets is a task of major importance in a wide variety of scientific fields. In this sense, cluster analysis algorithms are a key element of exploratory data analysis, due to their easiness in the…

Machine Learning · Statistics 2018-01-10 Marco Capó , Aritz Pérez , Jose A. Lozano

Comparative Analysis of Optimization Strategies for K-means Clustering in Big Data Contexts: A Review

This paper presents a comparative analysis of different optimization techniques for the K-means algorithm in the context of big data. K-means is a widely used clustering algorithm, but it can suffer from scalability issues when dealing with…

Machine Learning · Computer Science 2024-05-21 Ravil Mussabayev , Rustam Mussabayev

Query Workload-based RDF Graph Fragmentation and Allocation

As the volume of the RDF data becomes increasingly large, it is essential for us to design a distributed database system to manage it. For distributed RDF data design, it is quite common to partition the RDF data into some parts, called…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-02-23 Peng Peng , Lei Zou , Lei Chen , Dongyan Zhao

How to Use K-means for Big Data Clustering?

K-means plays a vital role in data mining and is the simplest and most widely used algorithm under the Euclidean Minimum Sum-of-Squares Clustering (MSSC) model. However, its performance drastically drops when applied to vast amounts of…

Machine Learning · Computer Science 2023-11-27 Rustam Mussabayev , Nenad Mladenovic , Bassem Jarboui , Ravil Mussabayev

Warehousing complex data from the Web

The data warehousing and OLAP technologies are now moving onto handling complex data that mostly originate from the Web. However, intagrating such data into a decision-support process requires their representation under a form processable…

Databases · Computer Science 2017-01-03 Omar Boussaid , Jerome Darmont , Fadila Bentayeb , Sabine Loudcher