Related papers: Pathways: Asynchronous Distributed Dataflow for ML

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and…

Machine Learning · Computer Science 2023-07-26 Michal Bartoszkiewicz , Jan Chorowski , Adrian Kosowski , Jakub Kowalski , Sergey Kulik , Mateusz Lewandowski , Krzysztof Nowicki , Kamil Piechowiak , Olivier Ruas , Zuzanna Stamirowska , Przemyslaw Uznanski

Automap: Towards Ergonomic Automated Parallelism for ML Models

The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly…

Machine Learning · Computer Science 2021-12-07 Michael Schaarschmidt , Dominik Grewe , Dimitrios Vytiniotis , Adam Paszke , Georg Stefan Schmid , Tamara Norman , James Molloy , Jonathan Godwin , Norman Alexander Rink , Vinod Nair , Dan Belov

Petuum: A New Platform for Distributed Machine Learning on Big Data

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

Machine Learning · Statistics 2015-05-18 Eric P. Xing , Qirong Ho , Wei Dai , Jin Kyu Kim , Jinliang Wei , Seunghak Lee , Xun Zheng , Pengtao Xie , Abhimanu Kumar , Yaoliang Yu

A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization

Path planning is critical for autonomous driving, generating smooth, collision-free, feasible paths based on perception and localization inputs. However, its computationally intensive nature poses significant challenges for…

Hardware Architecture · Computer Science 2025-07-23 Yifan Zhang , Xiaoyu Niu , Hongzheng Tian , Yanjun Zhang , Bo Yu , Shaoshan Liu , Sitao Huang

Stream: Design Space Exploration of Layer-Fused DNNs on Heterogeneous Dataflow Accelerators

As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far,…

Hardware Architecture · Computer Science 2025-10-08 Arne Symons , Linyan Mei , Steven Colleman , Pouya Houshmand , Sebastian Karl , Marian Verhelst

MatrixFlow: System-Accelerator co-design for high-performance transformer applications

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza

Exploiting Path Diversity in Datacenters using MPTCP-aware SDN

Recently, Multipath TCP (MPTCP) has been proposed as an alternative transport approach for datacenter networks. MPTCP provides the ability to split a flow into multiple paths thus providing better performance and resilience to failures.…

Networking and Internet Architecture · Computer Science 2016-08-31 Savvas Zannettou , Michael Sirivianos , Fragkiskos Papadopoulos

Efficient data streaming multiway aggregation through concurrent algorithmic designs and new abstract data types

Data streaming relies on continuous queries to process unbounded streams of data in a real-time fashion. It is commonly demanding in computation capacity, given that the relevant applications involve very large volumes of data. Data…

Data Structures and Algorithms · Computer Science 2016-06-16 Vincenzo Gulisano , Yiannis Nikolakopoulos , Daniel Cederman , Marina Papatriantafilou , Philippas Tsigas

Memory-constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms

The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements…

Signal Processing · Electrical Eng. & Systems 2017-12-01 Shuoxin Lin , Jiahao Wu , Shuvra S. Bhattacharyya

Multiprocessor Scheduling of a Multi-mode Dataflow Graph Considering Mode Transition Delay

Synchronous Data Flow (SDF) model is widely used for specifying signal processing or streaming applications. Since modern embedded applications become more complex with dynamic behavior changes at run-time, several extensions of the SDF…

Other Computer Science · Computer Science 2017-10-20 Hanwoong Jung , Hyunok Oh , Soonhoi Ha

Spinning Fast Iterative Data Flows

Parallel dataflow systems are a central part of most analytic pipelines for big data. The iterative nature of many analysis and machine learning algorithms, however, is still a challenge for current systems. While certain types of bulk…

Databases · Computer Science 2012-08-02 Stephan Ewen , Kostas Tzoumas , Moritz Kaufmann , Volker Markl

Accelerated DC loadflow solver for topology optimization

We present a massively parallel solver that accelerates DC loadflow computations for power grid topology optimization tasks. Our approach leverages low-rank updates of the Power Transfer Distribution Factors (PTDFs) to represent substation…

Systems and Control · Electrical Eng. & Systems 2025-01-30 Nico Westerbeck , Joost van Dijk , Jan Viebahn , Christian Merz , Dirk Witthaut

Path-accelerated molecular dynamics: Parallel-in-time integration using path integrals

Massively parallel computer architectures create new opportunities for the performance of long-timescale molecular dynamics (MD) simulations. Here, we introduce the path-accelerated molecular dynamics (PAMD) method that takes advantage of…

Computational Physics · Physics 2021-01-11 Jorge L. Rosa-Raíces , Bin Zhang , Thomas F. Miller

DiFS: Distributed Flow Scheduling for Data Center Networks

Data center networks leverage multiple parallel paths connecting end host pairs to offer high bisection bandwidth for cluster computing applications. However, state of the art distributed multi-pathing protocols such as Equal Cost Multipath…

Networking and Internet Architecture · Computer Science 2013-07-30 Wenzhi Cui , Chen Qian

High-Quality Hierarchical Process Mapping

Partitioning graphs into blocks of roughly equal size such that few edges run between blocks is a frequently needed operation when processing graphs on a parallel computer. When a topology of a distributed system is known an important task…

Data Structures and Algorithms · Computer Science 2020-01-23 Marcelo Fonseca Faraj , Alexander van der Grinten , Henning Meyerhenke , Jesper Larsson Träff , Christian Schulz

Streaming Task Graph Scheduling for Dataflow Architectures

Dataflow devices represent an avenue towards saving the control and data movement overhead of Load-Store Architectures. Various dataflow accelerators have been proposed, but how to efficiently schedule applications on such devices remains…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-06 Tiziano De Matteis , Lukas Gianinazzi , Johannes de Fine Licht , Torsten Hoefler

Opportunities to Parallelize Path Planning Algorithms for Autonomous Underwater Vehicles

This paper discusses opportunities to parallelize graph based path planning algorithms in a time varying environment. Parallel architectures have become commonplace, requiring algorithm to be parallelized for efficient execution. An…

Robotics · Computer Science 2020-08-07 Mike Eichhorn , Ulrich Kremer

StreamBlocks: A compiler for heterogeneous dataflow computing (technical report)

To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…

Hardware Architecture · Computer Science 2021-07-21 Endri Bezati , Mahyar Emami , Jörn Janneck , James Larus

An Alternating Direction Method Approach to Cloud Traffic Management

In this paper, we introduce a unified framework for studying various cloud traffic management problems, ranging from geographical load balancing to backbone traffic engineering. We first abstract these real-world problems as a…

Networking and Internet Architecture · Computer Science 2016-02-04 Chen Feng , Hong Xu , Baochun Li

TAPA-CS: Enabling Scalable Accelerator Design on Distributed HBM-FPGAs

Despite the increasing adoption of Field-Programmable Gate Arrays (FPGAs) in compute clouds, there remains a significant gap in programming tools and abstractions which can leverage network-connected, cloud-scale, multi-die FPGAs to…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-05 Neha Prakriya , Yuze Chi , Suhail Basalama , Linghao Song , Jason Cong