English
Related papers

Related papers: TREES: A CPU/GPU Task-Parallel Runtime with Explic…

200 papers

Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-08 Justin Deters , Jiaye Wu , Yifan Xu , I-Ting Angelina Lee

A merge tree is a topological descriptor of a real-valued function. Merge trees are used in visualization and topological data analysis, either directly or as a means to another end: computing a 0-dimensional persistence diagram,…

Computational Geometry · Computer Science 2023-01-31 Arnur Nigmetov , Dmitriy Morozov

This paper investigates the execution of tree-shaped task graphs using multiple processors. Each edge of such a tree represents some large data. A task can only be executed if all input and output data fit into memory, and a data can only…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-10-02 Lionel Eyraud-Dubois , Loris Marchal , Oliver Sinnen , Frédéric Vivien

Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multithreading CPU cores in…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-03 Denis Los , Igor Petushkov

We present a GPU solution for exact maximal clique enumeration (MCE) that performs a search tree traversal following the Bron-Kerbosch algorithm. Prior works on parallelizing MCE on GPUs perform a breadth-first traversal of the tree, which…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-25 Mohammad Almasri , Yen-Hsiang Chang , Izzat El Hajj , Rakesh Nagi , Jinjun Xiong , Wen-mei Hwu

In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-12 Jose Nunez-Yanez , Mohammad Hosseinabady , Moslem Amiri , Andrés Rodríguez , Rafael Asenjo , Angeles Navarro , Rubén Gran-Tejero , Darío Suárez-Gracia

Growing deployment of power and energy efficient throughput accelerators (GPU) in data centers demands enhancement of power-performance co-optimization capabilities of GPUs. Realization of exascale computing using accelerators requires…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-06 Nilanjan Goswami , Amer Qouneh , Chao Li , Tao Li

Monte Carlo Tree Search (MCTS) methods have achieved great success in many Artificial Intelligence (AI) benchmarks. The in-tree operations become a critical performance bottleneck in realizing parallel MCTS on CPUs. In this work, we develop…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-25 Yuan Meng , Rajgopal Kannan , Viktor Prasanna

This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…

Data Structures and Algorithms · Computer Science 2013-05-01 Guillaume Aupy , Manu Shantharam , Anne Benoit , Yves Robert , Padma Raghavan

When considering different hardware platforms, not just the time-to-solution can be of importance but also the energy necessary to reach it. This is not only the case with battery powered and mobile devices but also with high-performance…

Performance · Computer Science 2020-06-30 Philip Heinisch , Katharina Ostaszewski , Hendrik Ranocha

Parallel programming models can encourage performance portability by moving the responsibility for work assignment and data distribution from the programmer to a runtime system. However, analyzing the resulting implicit memory allocations,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-14 Fabian Knorr , Philip Salzmann , Peter Thoman , Thomas Fahringer

The rapid advancement of GPU technology has unlocked powerful parallel processing capabilities, creating new opportunities to enhance classic search algorithms. This hardware has been exploited in best-first search algorithms with neural…

Artificial Intelligence · Computer Science 2025-11-18 Ehsan Futuhi , Nathan R. Sturtevant

Scheduling real-time tasks that utilize GPUs with analyzable guarantees poses a significant challenge due to the intricate interaction between CPU and GPU resources, as well as the complex GPU hardware and software stack. While much…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-31 Yidi Wang , Cong Liu , Daniel Wong , Hyoseung Kim

The kd-tree is a fundamental tool in computer science. Among other applications, the application of kd-tree search (by the tree method) to the fast evaluation of particle interactions and neighbor search is highly important, since the…

Instrumentation and Methods for Astrophysics · Physics 2011-12-21 Naohito Nakasato

We evaluate and compare four contemporary and emerging runtimes for high-performance computing(HPC) applications: Cilk, Charm++, ParalleX and AM++. We compare along three bases: programming model, execution model and the implementation on…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-02 Abhishek Kulkarni , Andrew Lumsdaine

Mixed-Criticality (MC) systems have recently been devised to address the requirements of real-time systems in industrial applications, where the system runs tasks with different criticality levels on a single platform. In some workloads, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-01 Behnaz Ranjbar , Ali Hosseinghorban , Mohammad Salehi , Alireza Ejlali , Akash Kumar

Utilizing GPUs is critical for high performance on heterogeneous systems. However, leveraging the full potential of GPUs for accelerating legacy CPU applications can be a challenging task for developers. The porting process requires…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-27 Shilei Tian , Tom Scogland , Barbara Chapman , Johannes Doerfert

There is growing interest in accelerating irregular data-parallel algorithms on GPUs. These algorithms are typically blocking, so they require fair scheduling. But GPU programming models (e.g.\ OpenCL) do not mandate fair scheduling, and…

Programming Languages · Computer Science 2017-07-10 Tyler Sorensen , Hugues Evrard , Alastair F. Donaldson

Parallelization is needed everywhere, from laptops and mobile phones to supercomputers. Among parallel programming models, task-based programming has demonstrated a powerful potential and is widely used in high-performance scientific…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-18 Paul Cardosi , Bérenger Bramas

Achieving efficient task parallelism on many-core architectures is an important challenge. The widely used GNU OpenMP implementation of the popular OpenMP parallel programming model incurs high overhead for fine-grained, short-running tasks…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-20 Wenyi Wang , Maxime Gonthier , Poornima Nookala , Haochen Pan , Ian Foster , Ioan Raicu , Kyle Chard
‹ Prev 1 2 3 10 Next ›