Related papers: A Simple and Efficient MapReduce Algorithm for Dat…

Scalable Data Cube Analysis over Big Data

Data cubes are widely used as a powerful tool to provide multidimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube…

Databases · Computer Science 2013-11-25 Zhengkui Wang , Yan Chu , Kian-Lee Tan , Divyakant Agrawal , Amr EI Abbadi , Xiaolong Xu

Pruning Attribute Values From Data Cubes with Diamond Dicing

Data stored in a data warehouse are inherently multidimensional, but most data-pruning techniques (such as iceberg and top-k queries) are unidimensional. However, analysts need to issue multidimensional queries. For example, an analyst may…

Databases · Computer Science 2008-05-07 Hazel Webb , Owen Kaser , Daniel Lemire

MapReduce Particle Filtering with Exact Resampling and Deterministic Runtime

Particle filtering is a numerical Bayesian technique that has great potential for solving sequential estimation problems involving non-linear and non-Gaussian models. Since the estimation accuracy achieved by particle filters improves as…

Computation · Statistics 2017-11-22 Jeyarajan Thiyagalingam , Lykourgos Kekempanos , Simon Maskell

Computing Marginals Using MapReduce

We consider the problem of computing the data-cube marginals of a fixed order $k$ (i.e., all marginals that aggregate over $k$ dimensions), using a single round of MapReduce. The focus is on the relationship between the reducer size (number…

Databases · Computer Science 2015-09-30 Foto Afrati , Shantanu Sharma , Jeffrey D. Ullman , Jonathan R. Ullman

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

cube2net: Efficient Query-Specific Network Construction with Data Cube Organization

Networks are widely used to model objects with interactions and have enabled various downstream applications. However, in the real world, network mining is often done on particular query sets of objects, which does not require the…

Social and Information Networks · Computer Science 2020-02-04 Carl Yang , Mengxiong Liu , Frank He , Jian Peng , Jiawei Han

Exploiting Opportunistic Physical Design in Large-scale Data Analytics

Large-scale systems, such as MapReduce and Hadoop, perform aggressive materialization of intermediate job results in order to support fault tolerance. When jobs correspond to exploratory queries submitted by data analysts, these…

Databases · Computer Science 2013-12-11 Jeff LeFevre , Jagan Sankaranarayanan , Hakan Hacigumus , Junichi Tatemura , Neoklis Polyzotis , Michael J. Carey

Parallel K-Medoids++ Spatial Clustering Algorithm Based on MapReduce

Clustering analysis has received considerable attention in spatial data mining for several years. With the rapid development of the geospatial information technologies, the size of spatial information data is growing exponentially which…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-08-25 Xia Yue , Wang Man , Jun Yue , Guangcao Liu

Muppet: MapReduce-Style Processing of Fast Data

MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter…

Databases · Computer Science 2012-08-22 Wang Lam , Lu Liu , STS Prasad , Anand Rajaraman , Zoheb Vacheri , AnHai Doan

Energy Efficient Scheduling of MapReduce Jobs

MapReduce is emerged as a prominent programming model for data-intensive computation. In this work, we study power-aware MapReduce scheduling in the speed scaling setting first introduced by Yao et al. [FOCS 1995]. We focus on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-02-13 Evripidis Bampis , Vincent Chau , Dimitrios Letsios , Giorgio Lucarelli , Ioannis Milis , Georgios Zois

Transplantation of Data Mining Algorithms to Cloud Computing Platform when Dealing Big Data

This paper made a short review of Cloud Computing and Big Data, and discussed the portability of general data mining algorithms to Cloud Computing platform. It revealed the Cloud Computing platform based on Map-Reduce cannot solve all the…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-07 Yong Wang , Ya Wei Zhao

Connecting MapReduce Computations to Realistic Machine Models

We explain how the popular, highly abstract MapReduce model of parallel computation (MRC) can be rooted in reality by explaining how it can be simulated on realistic distributed-memory parallel machine models like BSP. We first refine the…

Data Structures and Algorithms · Computer Science 2020-02-19 Peter Sanders

Space-Round Tradeoffs for MapReduce Computations

This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by…

Data Structures and Algorithms · Computer Science 2013-06-13 Andrea Pietracaprina , Geppino Pucci , Matteo Riondato , Francesco Silvestri , Eli Upfal

Meta-MapReduce: A Technique for Reducing Communication in MapReduce Computations

MapReduce has proven to be one of the most useful paradigms in the revolution of distributed computing, where cloud services and cluster computing become the standard venue for computing. The federation of cloud and big data activities is…

Databases · Computer Science 2016-07-29 Foto Afrati , Shlomi Dolev , Shantanu Sharma , Jeffrey D. Ullman

MapReduce for Integer Factorization

Integer factorization is a very hard computational problem. Currently no efficient algorithm for integer factorization is publicly known. However, this is an important problem on which it relies the security of many real world cryptographic…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-01-05 Javier Tordable

Ef-QuantFace: Streamlined Face Recognition with Small Data and Low-Bit Precision

In recent years, model quantization for face recognition has gained prominence. Traditionally, compressing models involved vast datasets like the 5.8 million-image MS1M dataset as well as extensive training times, raising the question of…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 William Gazali , Jocelyn Michelle Kho , Joshua Santoso , Williem

MapReduce Scheduler: A 360-degree view

Undoubtedly, the MapReduce is the most powerful programming paradigm in distributed computing. The enhancement of the MapReduce is essential and it can lead the computing faster. Therefore, here are many scheduling algorithms to discuss…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-11 Rajdeep Das , Rohit Pratap Singh , Ripon Patgiri

MATE: Multi-view Attention for Table Transformer Efficiency

This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However, more than 20% of relational tables on the web have 20…

Computation and Language · Computer Science 2021-09-10 Julian Martin Eisenschlos , Maharshi Gor , Thomas Müller , William W. Cohen

A quasi-Monte Carlo data compression algorithm for machine learning

We introduce an algorithm to reduce large data sets using so-called digital nets, which are well distributed point sets in the unit cube. These point sets together with weights, which depend on the data set, are used to represent the data.…

Numerical Analysis · Mathematics 2021-05-31 Josef Dick , Michael Feischl

Polytope: An Algorithm for Efficient Feature Extraction on Hypercubes

Data extraction algorithms on data hypercubes, or datacubes, are traditionally only capable of cutting boxes of data along the datacube axes. For many use cases however, this is not a sufficient approach and returns more data than users…

Information Retrieval · Computer Science 2023-06-21 Mathilde Leuridan , James Hawkes , Simon Smart , Emanuele Danovaro , Tiago Quintino