Related papers: On the Complexity of Processing Massive, Unordered…

Analyzing Large-Scale, Distributed and Uncertain Data

The exponential growth of data in current times and the demand to gain information and knowledge from the data present new challenges for database researchers. Known database systems and algorithms are no longer capable of effectively…

Databases · Computer Science 2017-12-06 Yaron Gonen

Optimization and analysis of large scale data sorting algorithm based on Hadoop

When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-06-02 Zhuo Wang , Longlong Tian , Dianjie Guo , Xiaoming Jiang

Beyond Batch Processing: Towards Real-Time and Streaming Big Data

Today, big data is generated from many sources and there is a huge demand for storing, managing, processing, and querying on big data. The MapReduce model and its counterpart open source implementation Hadoop, has proven itself as the de…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-08-04 Saeed Shahrivari , Saeed Jalili

Streaming Big Data meets Backpressure in Distributed Network Computation

We study network response to queries that require computation of remotely located data and seek to characterize the performance limits in terms of maximum sustainable query rate that can be satisfied. The available resources include (i) a…

Networking and Internet Architecture · Computer Science 2016-11-17 Apostolos Destounis , Georgios S. Paschos , Iordanis Koutsopoulos

Online Machine Learning in Big Data Streams

The area of online machine learning in big data streams covers algorithms that are (1) distributed and (2) work from data streams with only a limited possibility to store past data. The first requirement mostly concerns software…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-19 András A. Benczúr , Levente Kocsis , Róbert Pálovics

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many…

Databases · Computer Science 2017-07-07 Shlomi Dolev , Patricia Florissi , Ehud Gudes , Shantanu Sharma , Ido Singer

Memory-Based Multi-Processing Method For Big Data Computation

The evolution of the Internet and computer applications have generated colossal amount of data. They are referred to as Big Data and they consist of huge volume, high velocity, and variable datasets that need to be managed at the right…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-08-13 Youssef Bassil

The Family of MapReduce and Large Scale Data Processing Systems

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a…

Databases · Computer Science 2013-02-14 Sherif Sakr , Anna Liu , Ayman G. Fayoumi

A Distributed Approach to LARS Stream Reasoning (System paper)

Stream reasoning systems are designed for complex decision-making from possibly infinite, dynamic streams of data. Modern approaches to stream reasoning are usually performing their computations using stand-alone solvers, which…

Artificial Intelligence · Computer Science 2020-02-19 Thomas Eiter , Paul Ogris , Konstantin Schekotihin

Overview of streaming-data algorithms

Due to recent advances in data collection techniques, massive amounts of data are being collected at an extremely fast pace. Also, these data are potentially unbounded. Boundless streams of data collected from sensors, equipments, and other…

Databases · Computer Science 2012-03-12 T Soni Madhulatha

Scheduling Storms and Streams in the Cloud

Motivated by emerging big streaming data processing paradigms (e.g., Twitter Storm, Streaming MapReduce), we investigate the problem of scheduling graphs over a large cluster of servers. Each graph is a job, where nodes represent compute…

Networking and Internet Architecture · Computer Science 2015-02-23 Javad Ghaderi , Sanjay Shakkottai , R Srikant

Towards a decentralized algorithm for mapping network and computational resources for distributed data-flow computations

Several high-throughput distributed data-processing applications require multi-hop processing of streams of data. These applications include continual processing on data streams originating from a network of sensors, composing a multimedia…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-03-26 Shah Asaduzzaman , Muthucumaru Maheswaran

Distributed Log Analysis on the Cloud Using MapReduce

In this paper we describe our work on designing a web based, distributed data analysis system based on the popular MapReduce framework deployed on a small cloud; developed specifically for analyzing web server logs. The log analysis system…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-13 Galip Aydin , Ibrahim Riza Hallac

The Distributed Computing Paradigms: P2P, Grid, Cluster, Cloud, and Jungle

The distributed computing is done on many systems to solve a large scale problem. The growing of high-speed broadband networks in developed and developing countries, the continual increase in computing power, and the rapid growth of the…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-14 Dr. Brijender Kahanwal , Dr. T. P. Singh

Analysis of Distributed Algorithms for Big-data

The parallel and distributed processing are becoming de facto industry standard, and a large part of the current research is targeted on how to make computing scalable and distributed, dynamically, without allocating the resources on…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-10 Rajendra Purohit , K R Chowdhary , S D Purohit

Leveraging Coding Techniques for Speeding up Distributed Computing

Large scale clusters leveraging distributed computing frameworks such as MapReduce routinely process data that are on the orders of petabytes or more. The sheer size of the data precludes the processing of the data on a single computer. The…

Information Theory · Computer Science 2018-02-12 Konstantinos Konstantinidis , Aditya Ramamoorthy

Petuum: A New Platform for Distributed Machine Learning on Big Data

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

Machine Learning · Statistics 2015-05-18 Eric P. Xing , Qirong Ho , Wei Dai , Jin Kyu Kim , Jinliang Wei , Seunghak Lee , Xun Zheng , Pengtao Xie , Abhimanu Kumar , Yaoliang Yu

Experimental Evaluation of Multi-Round Matrix Multiplication on MapReduce

A common approach in the design of MapReduce algorithms is to minimize the number of rounds. Indeed, there are many examples in the literature of monolithic MapReduce algorithms, which are algorithms requiring just one or two rounds.…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-01-22 Matteo Ceccarello , Francesco Silvestri

Computation-Aware Data Aggregation

Data aggregation is a fundamental primitive in distributed computing wherein a network computes a function of every nodes' input. However, while compute time is non-negligible in modern systems, standard models of distributed computing do…

Data Structures and Algorithms · Computer Science 2019-11-14 Bernhard Haeupler , D Ellis Hershkowitz , Anson Kahng , Ariel D. Procaccia

Optimal Data Selection: An Online Distributed View

The blessing of ubiquitous data also comes with a curse: the communication, storage, and labeling of massive, mostly redundant datasets. We seek to solve this problem at its core, collecting only valuable data and throwing out the rest via…

Machine Learning · Computer Science 2023-12-18 Mariel Werner , Anastasios Angelopoulos , Stephen Bates , Michael I. Jordan