Related papers: A Simple and Efficient MapReduce Algorithm for Dat…
Data cubes are widely used as a powerful tool to provide multidimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube…
Data stored in a data warehouse are inherently multidimensional, but most data-pruning techniques (such as iceberg and top-k queries) are unidimensional. However, analysts need to issue multidimensional queries. For example, an analyst may…
Particle filtering is a numerical Bayesian technique that has great potential for solving sequential estimation problems involving non-linear and non-Gaussian models. Since the estimation accuracy achieved by particle filters improves as…
We consider the problem of computing the data-cube marginals of a fixed order $k$ (i.e., all marginals that aggregate over $k$ dimensions), using a single round of MapReduce. The focus is on the relationship between the reducer size (number…
The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…
Networks are widely used to model objects with interactions and have enabled various downstream applications. However, in the real world, network mining is often done on particular query sets of objects, which does not require the…
Large-scale systems, such as MapReduce and Hadoop, perform aggressive materialization of intermediate job results in order to support fault tolerance. When jobs correspond to exploratory queries submitted by data analysts, these…
Clustering analysis has received considerable attention in spatial data mining for several years. With the rapid development of the geospatial information technologies, the size of spatial information data is growing exponentially which…
MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter…
MapReduce is emerged as a prominent programming model for data-intensive computation. In this work, we study power-aware MapReduce scheduling in the speed scaling setting first introduced by Yao et al. [FOCS 1995]. We focus on the…
This paper made a short review of Cloud Computing and Big Data, and discussed the portability of general data mining algorithms to Cloud Computing platform. It revealed the Cloud Computing platform based on Map-Reduce cannot solve all the…
We explain how the popular, highly abstract MapReduce model of parallel computation (MRC) can be rooted in reality by explaining how it can be simulated on realistic distributed-memory parallel machine models like BSP. We first refine the…
This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by…
MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is…
Integer factorization is a very hard computational problem. Currently no efficient algorithm for integer factorization is publicly known. However, this is an important problem on which it relies the security of many real world cryptographic…
In recent years, model quantization for face recognition has gained prominence. Traditionally, compressing models involved vast datasets like the 5.8 million-image MS1M dataset as well as extensive training times, raising the question of…
Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss…
This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However, more than 20% of relational tables on the web have 20…
We introduce an algorithm to reduce large data sets using so-called digital nets, which are well distributed point sets in the unit cube. These point sets together with weights, which depend on the data set, are used to represent the data.…
Data extraction algorithms on data hypercubes, or datacubes, are traditionally only capable of cutting boxes of data along the datacube axes. For many use cases however, this is not a sufficient approach and returns more data than users…