English
Related papers

Related papers: Thrill: High-Performance Algorithmic Distributed B…

200 papers

MapReduce and its variants have significantly simplified and accelerated the process of developing parallel programs. However, most MapReduce implementations focus on data-intensive tasks while many real-world tasks are compute intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-07 Junhao Li , Hang Zhang

Non-volatile random access memory (NVRAM) offers byte-addressable persistence at speeds comparable to DRAM. However, with caches remaining volatile, automatic cache evictions can reorder updates to memory, potentially leaving persistent…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-08-20 Yuanhao Wei , Naama Ben-David , Michal Friedman , Guy E. Blelloch , Erez Petrank

Big data processing is a hot topic in today's computer science world. There is a significant demand for analysing big data to satisfy many requirements of many industries. Emergence of the Kappa architecture created a strong requirement for…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-17 Shelan Perera , Ashansa Perera , Kamal Hakimzadeh

As dataset sizes increase, data analysis tasks in high performance computing (HPC) are increasingly dependent on sophisticated dataflows and out-of-core methods for efficient system utilization. In addition, as HPC systems grow, memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-01 George K. Thiruvathukal , Cameron Christensen , Xiaoyong Jin , François Tessier , Venkatram Vishwanath

High Performance Computing is notorious for its long and expensive software development cycle. To address this challenge, we present Bind: a "partitioned global workflow" parallel programming model for C++ applications that enables quick…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-16 Alex Kosenkov , Matthias Troyer

Efficient code retrieval is critical for developer productivity, yet existing benchmarks largely focus on Python and rarely stress-test robustness beyond superficial lexical cues. To address the gap, we introduce an automated pipeline for…

Software Engineering · Computer Science 2026-03-06 Kaicheng Wang , Liyan Huang , Weike Fang , Weihang Wang

On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-14 Thomas Heller , Hartmut Kaiser , Patrick Diehl , Dietmar Fey , Marc Alexander Schweitzer

Stochastic algorithms are efficient approaches to solving machine learning and optimization problems. In this paper, we propose a general framework called Splash for parallelizing stochastic algorithms on multi-node distributed systems.…

Machine Learning · Computer Science 2015-09-24 Yuchen Zhang , Michael I. Jordan

Distributed in-memory data processing engines accelerate iterative applications by caching substantial datasets in memory rather than recomputing them in each iteration. Selecting a suitable cluster size for caching these datasets plays an…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-07-07 Hani Al-Sayeh , Muhammad Attahir Jibril , Bunjamin Memishi , Kai-Uwe Sattler

Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-03 Cheng-Hsiang Chiu , Tsung-Wei Huang , Zizheng Guo , Yibo Lin

The DataFlow is sub-system of the ATLAS data acquisition responsible for the reception, buffering and subsequent movement of partial and full event data to the higher level triggers: Level 2 and Event Filter. The design of the software is…

Instrumentation and Detectors · Physics 2007-05-23 S. Gadomski

Spark is an in-memory analytics platform that targets commodity server environments today. It relies on the Hadoop Distributed File System (HDFS) to persist intermediate checkpoint states and final processing results. In Spark, immutable…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-08-22 Mijung Kim , Jun Li , Haris Volos , Manish Marwah , Alexander Ulanov , Kimberly Keeton , Joseph Tucek , Lucy Cherkasova , Le Xu , Pradeep Fernando

The objective of this work was to utilize BigBench [1] as a Big Data benchmark and evaluate and compare two processing engines: MapReduce [2] and Spark [3]. MapReduce is the established engine for processing data on Hadoop. Spark is a…

Databases · Computer Science 2016-01-14 Todor Ivanov , Max-Georg Beer

Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of the parallel algorithms,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-12 Ruben Laso , Diego Krupitza , Sascha Hunold

Data preprocessing techniques are devoted to correct or alleviate errors in data. Discretization and feature selection are two of the most extended data preprocessing techniques. Although we can find many proposals for static Big Data…

Databases · Computer Science 2018-10-16 Alejandro Alcalde-Barros , Diego García-Gil , Salvador García , Francisco Herrera

Compressed bitmap indexes are used in systems such as Git or Oracle to accelerate queries. They represent sets and often support operations such as unions, intersections, differences, and symmetric differences. Several important systems…

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-30 Bilal Akil , Ying Zhou , Uwe Röhm

We present a modern C++17-compatible thread pool implementation, built from scratch with high-performance scientific computing in mind. The thread pool is implemented as a single lightweight and self-contained class, and does not have any…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-29 Barak Shoshany

The C/C++ memory model provides an interface and execution model for programmers of concurrent (shared-variable) code. It provides a range of mechanisms that abstract from underlying hardware memory models -- that govern how multicore…

Programming Languages · Computer Science 2022-04-08 Robert J. Colvin

The multi-resolution approximation (MRA) of Gaussian processes was recently proposed to conduct likelihood-based inference for massive spatial data sets. An advantage of the methodology is that it can be parallelized. We implemented the MRA…

Computation · Statistics 2019-05-07 Huang Huang , Lewis R. Blake , Dorit M. Hammerling
‹ Prev 1 2 3 10 Next ›