English
Related papers

Related papers: Accelerating sequential programs using FastFlow an…

200 papers

FastFlow is a structured parallel programming framework targeting shared memory multicores. Its layered design and the optimized implementation of the communication mechanisms used to implement the FastFlow streaming networks provided to…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-25 Marco Aldinucci , Marco Danelutto , Massimo Torquati

Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-09-10 Marco Aldinucci , Massimo Torquati , Massimiliano Meneghin

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-17 Tsung-Wei Huang , Yibo Lin

Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-03 Cheng-Hsiang Chiu , Tsung-Wei Huang , Zizheng Guo , Yibo Lin

TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML)…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-03 Steven W. D. Chien , Stefano Markidis , Vyacheslav Olshevsky , Yaroslav Bulatov , Erwin Laure , Jeffrey S. Vetter

Fine-tuning large language models (LLMs) often exceeds GPU memory limits, prompting systems to offload model states to CPU memory. However, existing offloaded training frameworks like ZeRO-Offload treat all parameters equally and update the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-06 Tingfeng Lan , Yusen Wu , Bin Ma , Zhaoyuan Su , Rui Yang , Tekin Bicer , Masahiro Tanaka , Olatunji Ruwase , Dong Li , Yue Cheng

We present a unified programming model for heterogeneous computing systems. Such systems integrate multiple computing accelerators and memory units to deliver higher performance than CPU-centric systems. Although heterogeneous systems have…

Emerging Technologies · Computer Science 2024-04-18 Zixuan Wang , Jishen Zhao

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

FPGA programming is more complex as compared to Central Processing Units (CPUs) and Graphics Processing Units (GPUs). The coding languages to define the abstraction of Register Transfer Level (RTL) in High Level Synthesis (HLS) for FPGA…

Hardware Architecture · Computer Science 2024-10-04 Rourab Paul , Alberto Ottimo , Marco Danelutto

As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading,…

Operating Systems · Computer Science 2026-01-13 Misun Park , Richi Dubey , Yifan Yuan , Nam Sung Kim , Ada Gavrilovska

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza

Serverless computing that runs functions with auto-scaling is a popular task execution pattern in the cloud-native era. By connecting serverless functions into workflows, tenants can achieve complex functionality. Prior researches adopt the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-01 Zijun Li , Chuhao Xu , Quan Chen , Jieru Zhao , Chen Chen , Minyi Guo

The rise of serverless computing introduced a new class of scalable, elastic and widely available parallel workers in the cloud. Many systems and applications benefit from offloading computations and parallel tasks to dynamically allocated…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-23 Marcin Copik , Lukas Möller , Alexandru Calotoiu , Torsten Hoefler

In recent years, utilization of heterogeneous hardware other than small core CPU such as GPU, FPGA or many core CPU is increasing. However, when using heterogeneous hardware, barriers of technical skills such as CUDA are high. Based on…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-10 Yoji Yamato

In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-12 Jose Nunez-Yanez , Mohammad Hosseinabady , Moslem Amiri , Andrés Rodríguez , Rafael Asenjo , Angeles Navarro , Rubén Gran-Tejero , Darío Suárez-Gracia

We present MadFlow, a first general multi-purpose framework for Monte Carlo (MC) event simulation of particle physics processes designed to take full advantage of hardware accelerators, in particular, graphics processing units (GPUs). The…

Computational Physics · Physics 2021-08-18 Stefano Carrazza , Juan Cruz-Martinez , Marco Rossi , Marco Zaro

When using heterogeneous hardware other than CPUs, barriers of technical skills such as OpenCL are high. Based on that, I have proposed environment adaptive software that enables automatic conversion, configuration, and high-performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-22 Yoji Yamato

Dataflow programming is a popular and convenient programming paradigm in systems modelling, optimisation, and machine learning. It has a number of advantages, for instance the lacks of control flow allows computation to be carried out in…

Programming Languages · Computer Science 2021-03-03 Steven W. T. Cheung , Dan R. Ghica , Koko Muroya

The modern trend in High-Performance Computing (HPC) involves the use of accelerators such as Graphics Processing Units (GPUs) alongside Central Processing Units (CPUs) to speed up numerical operations in various applications. Leading…

Mathematical Software · Computer Science 2025-07-25 Giulio Malenza , Giovanni Stabile , Filippo Spiga , Robert Birke , Marco Aldinucci

In the recent years, systems using FPGAs, GPUs have increased due to their advantages such as power efficiency compared to CPUs. However, use in systems such as FPGAs and GPUs requires understanding hardware-specific technical…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-11 Yoji Yamato
‹ Prev 1 2 3 10 Next ›