Related papers: FastFlow tutorial

Accelerating sequential programs using FastFlow and self-offloading

FastFlow is a programming environment specifically targeting cache-coherent shared-memory multi-cores. FastFlow is implemented as a stack of C++ template libraries built on top of lock-free (fence-free) synchronization mechanisms. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-26 Marco Aldinucci , Marco Danelutto , Peter Kilpatrick , Massimiliano Meneghin , Massimo Torquati

FastFlow: Efficient Parallel Streaming Applications on Multi-core

Shared memory multiprocessors come back to popularity thanks to rapid spreading of commodity multi-core architectures. As ever, shared memory programs are fairly easy to write and quite hard to optimise; providing multi-core programmers…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-09-10 Marco Aldinucci , Massimo Torquati , Massimiliano Meneghin

Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-03 Cheng-Hsiang Chiu , Tsung-Wei Huang , Zizheng Guo , Yibo Lin

FastFlow in FPGA Stacks of Data Centers

FPGA programming is more complex as compared to Central Processing Units (CPUs) and Graphics Processing Units (GPUs). The coding languages to define the abstraction of Register Transfer Level (RTL) in High Level Synthesis (HLS) for FPGA…

Hardware Architecture · Computer Science 2024-10-04 Rourab Paul , Alberto Ottimo , Marco Danelutto

TensorFlow Doing HPC

TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML)…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-03 Steven W. D. Chien , Stefano Markidis , Vyacheslav Olshevsky , Yaroslav Bulatov , Erwin Laure , Jeffrey S. Vetter

Transparent Synchronous Dataflow

Dataflow programming is a popular and convenient programming paradigm in systems modelling, optimisation, and machine learning. It has a number of advantages, for instance the lacks of control flow allows computation to be carried out in…

Programming Languages · Computer Science 2021-03-03 Steven W. T. Cheung , Dan R. Ghica , Koko Muroya

StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)

To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…

Hardware Architecture · Computer Science 2021-07-21 Endri Bezati , Mahyar Emami , Jörn Janneck , James Larus

TensorFlow: A system for large-scale machine learning

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-01 Martín Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , Xiaoqiang Zheng

Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-25 Robert Lim , Boyana Norris , Allen Malony

Fork is All You Need in Heterogeneous Systems

We present a unified programming model for heterogeneous computing systems. Such systems integrate multiple computing accelerators and memory units to deliver higher performance than CPU-centric systems. Although heterogeneous systems have…

Emerging Technologies · Computer Science 2024-04-18 Zixuan Wang , Jishen Zhao

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

A Generalized Streaming Model for Concurrent Computing

Multicore parallel programming has some very difficult problems such as deadlocks during synchronizations and race conditions brought by concurrency. Added to the difficulty is the lack of a simple, well-accepted computing model for…

Programming Languages · Computer Science 2010-12-09 Yibing Wang

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-17 Martín Abadi , Ashish Agarwal , Paul Barham , Eugene Brevdo , Zhifeng Chen , Craig Citro , Greg S. Corrado , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Ian Goodfellow , Andrew Harp , Geoffrey Irving , Michael Isard , Yangqing Jia , Rafal Jozefowicz , Lukasz Kaiser , Manjunath Kudlur , Josh Levenberg , Dan Mane , Rajat Monga , Sherry Moore , Derek Murray , Chris Olah , Mike Schuster , Jonathon Shlens , Benoit Steiner , Ilya Sutskever , Kunal Talwar , Paul Tucker , Vincent Vanhoucke , Vijay Vasudevan , Fernanda Viegas , Oriol Vinyals , Pete Warden , Martin Wattenberg , Martin Wicke , Yuan Yu , Xiaoqiang Zheng

Parallel Programming Models for Heterogeneous Many-Cores : A Survey

Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-11 Jianbin Fang , Chun Huang , Tao Tang , Zheng Wang

Concurrent CPU-GPU Task Programming using Modern C++

In this paper, we introduce Heteroflow, a new C++ library to help developers quickly write parallel CPU-GPU programs using task dependency graphs. Heteroflow leverages the power of modern C++ and task-based approaches to enable efficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-17 Tsung-Wei Huang , Yibo Lin

vFlow: A GUI-Based Tool for Building Batch Applications for Cloud Computing

In this paper we introduce vFlow - A framework for rapid designing of batch processing applications for Cloud Computing environment. vFlow batch processing system extracts tasks from the vPlans diagrams, systematically captures the dynamics…

Software Engineering · Computer Science 2011-11-15 Hassan Gobjuka , Kamal Ahmat

Scikit-Multiflow: A Multi-output Streaming Framework

Scikit-multiflow is a multi-output/multi-label and stream data mining framework for the Python programming language. Conceived to serve as a platform to encourage democratization of stream learning research, it provides multiple state of…

Machine Learning · Computer Science 2020-05-18 Jacob Montiel , Jesse Read , Albert Bifet , Talel Abdessalem

Porting Decision Tree Algorithms to Multicore using FastFlow

The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-09-21 Marco aldinucci , Salvatore Ruggieri , Massimo Torquati

FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

Flow-matching models deliver state-of-the-art fidelity in image and video generation, but the inherent sequential denoising process renders them slower. Existing acceleration methods like distillation, trajectory truncation, and consistency…

Computer Vision and Pattern Recognition · Computer Science 2026-02-12 Divya Jyoti Bajpai , Dhruv Bhardwaj , Soumya Roy , Tejas Duseja , Harsh Agarwal , Aashay Sandansing , Manjesh Kumar Hanawal

MatrixFlow: System-Accelerator co-design for high-performance transformer applications

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza