Related papers: Leveraging Parallel Data Processing Frameworks wit…

Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become…

Databases · Computer Science 2018-06-20 Maaz Bin Safeer Ahmad , Alvin Cheung

A Framework for Genetic Algorithms Based on Hadoop

Genetic Algorithms (GAs) are powerful metaheuristic techniques mostly used in many real-world applications. The sequential execution of GAs requires considerable computational power both in time and resources. Nevertheless, GAs are…

Neural and Evolutionary Computing · Computer Science 2013-12-17 Filomena Ferrucci , M-Tahar Kechadi , Pasquale Salza , Federica Sarro

Towards co-designed optimizations in parallel frameworks: A MapReduce case study

The explosion of Big Data was followed by the proliferation of numerous complex parallel software stacks whose aim is to tackle the challenges of data deluge. A drawback of a such multi-layered hierarchical deployment is the inability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-04-01 Colin Barrett , Christos Kotselidis , Mikel Luján

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

There are billions of lines of sequential code inside nowadays' software which do not benefit from the parallelism available in modern multicore architectures. Automatically parallelizing sequential code, to promote an efficient use of the…

Programming Languages · Computer Science 2016-04-13 Alcides Fonseca , Bruno Cabral , João Rafael , Ivo Correia

An Alternative C++ based HPC system for Hadoop MapReduce

MapReduce is a technique used to vastly improve distributed processing of data and can massively speed up computation. Hadoop and its MapReduce relies on JVM and Java which is expensive on memory. High Performance Computing based MapReduce…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-29 Vignesh S. , Muthumanikandan V. , Siddarth S. , Sainath G

Tenspiler: A Verified Lifting-Based Compiler for Tensor Operations (Extended Version)

Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and…

Programming Languages · Computer Science 2024-12-17 Jie Qiu , Colin Cai , Sahil Bhatia , Niranjan Hasabnis , Sanjit A. Seshia , Alvin Cheung

Scalable Deductive Verification of Data-Level Parallel Programs

This paper introduces several techniques that improve the scalability of the deductive verification of data-level programs working on arrays and matrices. First of all, we introduce a technique to rewrite expressions with (nested)…

Software Engineering · Computer Science 2026-05-14 Lars B. van den Haak , Anton Wijs , Marieke Huisman

Parallel Mapper

The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Mustafa Hajij , Basem Assiri , Paul Rosen

An Experimental Evaluation of Performance of A Hadoop Cluster on Replica Management

Hadoop is an open source implementation of the MapReduce Framework in the realm of distributed processing. A Hadoop cluster is a unique type of computational cluster designed for storing and analyzing large data sets across cluster of…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-11-10 Muralikrishnan Ramane , Sharmila Krishnamoorthy , Sasikala Gowtham

An Easy-to-use Scalable Framework for Parallel Recursive Backtracking

Supercomputers are equipped with an increasingly large number of cores to use computational power as a way of solving problems that are otherwise intractable. Unfortunately, getting serial algorithms to run in parallel to take advantage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-12-31 Faisal N. Abu-Khzam , Khuzaima Daudjee , Amer E. Mouawad , Naomi Nishimura

High Performance Risk Aggregation: Addressing the Data Processing Challenge the Hadoop MapReduce Way

Monte Carlo simulations employed for the analysis of portfolios of catastrophic risk process large volumes of data. Often times these simulations are not performed in real-time scenarios as they are slow and consume large data. Such…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-25 Zhimin Yao , Blesson Varghese , Andrew Rau-Chaplin

FooPar: A Functional Object Oriented Parallel Framework in Scala

We present FooPar, an extension for highly efficient Parallel Computing in the multi-paradigm programming language Scala. Scala offers concise and clean syntax and integrates functional programming features. Our framework FooPar combines…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-06-14 Felix P. Hargreaves , Daniel Merkle

CAPER: Coarsen, Align, Project, Refine - A General Multilevel Framework for Network Alignment

Network alignment, or the task of finding corresponding nodes in different networks, is an important problem formulation in many application domains. We propose CAPER, a multilevel alignment framework that Coarsens the input graphs, Aligns…

Social and Information Networks · Computer Science 2022-08-24 Jing Zhu , Danai Koutra , Mark Heimann

Automatic Parallelization of Sequential Programs

Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-21 Peter Kraft , Amos Waterland , Daniel Y Fu , Anitha Gollamudi , Shai Szulanski , Margo Seltzer

ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment

Applications in science and engineering often require huge computational resources for solving problems within a reasonable time frame. Parallel supercomputers provide the computational infrastructure for solving such problems. A…

Distributed, Parallel, and Cluster Computing · Computer Science 2007-05-23 Rajesh Sudarsan , Calvin J. Ribbens

GraphLab: A New Framework for Parallel Machine Learning

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…

Machine Learning · Computer Science 2010-06-28 Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin , Joseph M. Hellerstein

Hierarchical Learning of Cross-Language Mappings through Distributed Vector Representations for Code

Translating a program written in one programming language to another can be useful for software development tasks that need functionality implementations in different languages. Although past studies have considered this problem, they may…

Machine Learning · Computer Science 2018-03-14 Nghi D. Q. Bui , Lingxiao Jiang

GraphLab: A New Framework For Parallel Machine Learning

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…

Machine Learning · Computer Science 2014-08-12 Yucheng Low , Joseph E. Gonzalez , Aapo Kyrola , Danny Bickson , Carlos E. Guestrin , Joseph Hellerstein

Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a…

Programming Languages · Computer Science 2021-07-02 Chaitanya Koparkar , Mike Rainey , Michael Vollmer , Milind Kulkarni , Ryan R. Newton

Distributed GraphLab: A Framework for Machine Learning in the Cloud

While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning…

Databases · Computer Science 2012-04-30 Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin , Joseph M. Hellerstein