Related papers: StreamBlocks: A compiler for heterogeneous dataflo…

StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs

Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…

Hardware Architecture · Computer Science 2025-09-24 Hanchen Ye , Deming Chen

Fork is All You Need in Heterogeneous Systems

We present a unified programming model for heterogeneous computing systems. Such systems integrate multiple computing accelerators and memory units to deliver higher performance than CPU-centric systems. Although heterogeneous systems have…

Emerging Technologies · Computer Science 2024-04-18 Zixuan Wang , Jishen Zhao

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…

Programming Languages · Computer Science 2020-11-02 Michail Papadimitriou , Juan Fumero , Athanasios Stratikopoulos , Foivos S. Zakkak , Christos Kotselidis

High-level Stream Processing: A Complementary Analysis of Fault Recovery

Parallel computing is very important to accelerate the performance of software systems. Additionally, considering that a recurring challenge is to process high data volumes continuously, stream processing emerged as a paradigm and software…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-14 Adriano Vogel , Sören Henning , Esteban Perez-Wohlfeil , Otmar Ertl , Rick Rabiser

CODO: An Automated Compiler for Comprehensive Dataflow Optimization

FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. However, manually implementing an efficient…

Hardware Architecture · Computer Science 2026-04-15 Weichuang Zhang , Yiquan Wang , Xinzhou Zhang , Chi Zhang , Yu Feng , Xiaofeng Hou , Chao Li , Jieru Zhao , Minyi Guo

FastFlow: Efficient Parallel Streaming Applications on Multi-core

Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-09-10 Marco Aldinucci , Massimo Torquati , Massimiliano Meneghin

FastFlow in FPGA Stacks of Data Centers

FPGA programming is more complex as compared to Central Processing Units (CPUs) and Graphics Processing Units (GPUs). The coding languages to define the abstraction of Register Transfer Level (RTL) in High Level Synthesis (HLS) for FPGA…

Hardware Architecture · Computer Science 2024-10-04 Rourab Paul , Alberto Ottimo , Marco Danelutto

FLOWER: A comprehensive dataflow compiler for high-level synthesis

FPGAs have found their way into data centers as accelerator cards, making reconfigurable computing more accessible for high-performance applications. At the same time, new high-level synthesis compilers like Xilinx Vitis and runtime…

Hardware Architecture · Computer Science 2021-12-16 Puya Amiri , Arsène Pérard-Gayot , Richard Membarth , Philipp Slusallek , Roland Leißa , Sebastian Hack

Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators

As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far,…

Hardware Architecture · Computer Science 2025-10-08 Arne Symons , Linyan Mei , Steven Colleman , Pouya Houshmand , Sebastian Karl , Marian Verhelst

HSTREAM: A directive-based language extension for heterogeneous stream computing

Big data streaming applications require utilization of heterogeneous parallel computing systems, which may comprise multiple multi-core CPUs and many-core accelerating devices such as NVIDIA GPUs and Intel Xeon Phis. Programming such…

Programming Languages · Computer Science 2023-05-12 Suejb Memeti , Sabri Pllana

Stream Distributed Coded Computing

The emerging large-scale and data-hungry algorithms require the computations to be delegated from a central server to several worker nodes. One major challenge in the distributed computations is to tackle delays and failures caused by the…

Information Theory · Computer Science 2021-03-03 Alejandro Cohen , Guillaume Thiran , Homa Esfahanizadeh , Muriel Médard

A Scalable Pipelined Dataflow Accelerator for Object Region Proposals on FPGA Platform

Region proposal is critical for object detection while it usually poses a bottleneck in improving the computation efficiency on traditional control-flow architectures. We have observed region proposal tasks are potentially suitable for…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-30 Wenzhi Fu , Jianlei Yang , Pengcheng Dai , Yiran Chen , Weisheng Zhao

FastFlow tutorial

FastFlow is a structured parallel programming framework targeting shared memory multicores. Its layered design and the optimized implementation of the communication mechanisms used to implement the FastFlow streaming networks provided to…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-25 Marco Aldinucci , Marco Danelutto , Massimo Torquati

StreamFlow: cross-breeding cloud with HPC

Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-09 Iacopo Colonnelli , Barbara Cantalupo , Ivan Merelli , Marco Aldinucci

Stream Processor Generator for HPC to Embedded Applications on FPGA-based System Platform

This paper presents a stream processor generator, called SPGen, for FPGA-based system-on-chip platforms. In our research project, we use an FPGA as a common platform for applications ranging from HPC to embedded/robotics computing.…

Other Computer Science · Computer Science 2014-08-25 Kentaro Sano , Hayato Suzuki , Ryo Ito , Tomohiro Ueno , Satoru Yamamoto

ReGraph: Scaling Graph Processing on HBM-enabled FPGAs with Heterogeneous Pipelines

The use of FPGAs for efficient graph processing has attracted significant interest. Recent memory subsystem upgrades including the introduction of HBM in FPGAs promise to further alleviate memory bottlenecks. However, modern multi-channel…

Hardware Architecture · Computer Science 2022-03-08 Xinyu Chen , Yao Chen , Feng Cheng , Hongshi Tan , Bingsheng He , Weng-Fai Wong

Parallelizing Workload Execution in Embedded and High-Performance Heterogeneous Systems

In this paper, we introduce a software-defined framework that enables the parallel utilization of all the programmable processing resources available in heterogeneous system-on-chip (SoC) including FPGA-based hardware accelerators and…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-12 Jose Nunez-Yanez , Mohammad Hosseinabady , Moslem Amiri , Andrés Rodríguez , Rafael Asenjo , Angeles Navarro , Rubén Gran-Tejero , Darío Suárez-Gracia

Causify DataFlow: A Framework For High-performance Machine Learning Stream Computing

We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflows assume finite datasets and require substantial…

Machine Learning · Computer Science 2026-01-01 Giacinto Paolo Saggese , Paul Smith

Stream-HLS: Towards Automatic Dataflow Acceleration

High-level synthesis (HLS) has enabled the rapid development of custom hardware circuits for many software applications. However, developing high-performance hardware circuits using HLS is still a non-trivial task requiring expertise in…

Hardware Architecture · Computer Science 2025-01-17 Suhail Basalama , Jason Cong

ImageCL: An Image Processing Language for Performance Portability on Heterogeneous Systems

Modern computer systems typically conbine multicore CPUs with accelerators like GPUs for inproved performance and energy efficiency. However, these sys- tems suffer from poor performance portability, code tuned for one device must be…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-05-23 Thomas L. Falch , Anne C. Elster