Related papers: Spinning Fast Iterative Data Flows

Labyrinth: Compiling Imperative Control Flow to Parallel Dataflows

Parallel dataflow systems have become a standard technology for large-scale data analytics. Complex data analysis programs in areas such as machine learning and graph analytics often involve control flow, i.e., iterations and branching.…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-16 Gábor E. Gévay , Tilmann Rabl , Sebastian Breß , Loránd Madai-Tahy , Volker Markl

Scaling Inter-procedural Dataflow Analysis on the Cloud

Apart from forming the backbone of compiler optimization, static dataflow analysis has been widely applied in a vast variety of applications, such as bug detection, privacy analysis, program comprehension, etc. Despite its importance,…

Programming Languages · Computer Science 2024-12-18 Zewen Sun , Yujin Zhang , Duanchen Xu , Yiyu Zhang , Yun Qi , Yueyang Wang , Yi Li , Zhaokang Wang , Yue Li , Xuandong Li , Zhiqiang Zuo , Qingda Lu , Wenwen Peng , Shengjian Guo

An Abstract View of Big Data Processing Programs

This paper proposes a model for specifying data flow based parallel data processing programs agnostic of target Big Data processing frameworks. The paper focuses on the formal abstract specification of non-iterative and iterative programs,…

Software Engineering · Computer Science 2021-08-06 Joao Batista de Souza Neto , Anamaria Martins Moreira , Genoveva Vargas-Solar , Martin A. Musicante

Cost optimization of data flows based on task re-ordering

Analyzing big data in a highly dynamic environment becomes more and more critical because of the increasingly need for end-to-end processing of this data. Modern data flows are quite complex and there are not efficient, cost-based,…

Databases · Computer Science 2015-07-31 Georgia Kougka , Anastasios Gounaris

Iterative MapReduce for Large Scale Machine Learning

Large datasets ("Big Data") are becoming ubiquitous because the potential value in deriving insights from data, across a wide range of business and scientific applications, is increasingly recognized. In particular, machine learning - one…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-03-15 Joshua Rosen , Neoklis Polyzotis , Vinayak Borkar , Yingyi Bu , Michael J. Carey , Markus Weimer , Tyson Condie , Raghu Ramakrishnan

FlowLog: Efficient and Extensible Datalog via Incrementality

Datalog-based languages are regaining popularity as a powerful abstraction for expressing recursive computations in domains such as program analysis and graph processing. However, existing systems often face a trade-off between efficiency…

Databases · Computer Science 2025-11-18 Hangdong Zhao , Zhenghong Yu , Srinag Rao , Simon Frisk , Zhiwei Fan , Paraschos Koutris

A Comparison of Big Data Frameworks on a Layered Dataflow Model

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-17 Claudia Misale , Maurizio Drocco , Marco Aldinucci , Guy Tremblay

Causify DataFlow: A Framework For High-performance Machine Learning Stream Computing

We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflows assume finite datasets and require substantial…

Machine Learning · Computer Science 2026-01-01 Giacinto Paolo Saggese , Paul Smith

Pipeflow: An Efficient Task-Parallel Pipeline Programming Framework using Modern C++

Pipeline is a fundamental parallel programming pattern. Mainstream pipeline programming frameworks count on data abstractions to perform pipeline scheduling. This design is convenient for data-centric pipeline applications but inefficient…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-03 Cheng-Hsiang Chiu , Tsung-Wei Huang , Zizheng Guo , Yibo Lin

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-03 Shiqing Fan , Yi Rong , Chen Meng , Zongyan Cao , Siyu Wang , Zhen Zheng , Chuan Wu , Guoping Long , Jun Yang , Lixue Xia , Lansong Diao , Xiaoyong Liu , Wei Lin

Petuum: A New Platform for Distributed Machine Learning on Big Data

What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization…

Machine Learning · Statistics 2015-05-18 Eric P. Xing , Qirong Ho , Wei Dai , Jin Kyu Kim , Jinliang Wei , Seunghak Lee , Xun Zheng , Pengtao Xie , Abhimanu Kumar , Yaoliang Yu

Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

With the explosive growth of big data, workloads tend to get more complex and computationally demanding. Such applications are processed on distributed interconnected resources that are becoming larger in scale and computational capacity.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-30 Georgios L. Stavrinides , Helen D. Karatza

Improving Performance of Iterative Methods by Lossy Checkponting

Iterative methods are commonly used approaches to solve large, sparse linear systems, which are fundamental operations for many modern scientific simulations. When the large-scale iterative methods are running with a large number of ranks…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-30 Dingwen Tao , Sheng Di , Xin Liang , Zizhong Chen , Franck Cappello

Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System

Taskflow aims to streamline the building of parallel and heterogeneous applications using a lightweight task graph-based approach. Taskflow introduces an expressive task graph programming model to assist developers in the implementation of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-08 Tsung-Wei Huang , Dian-Lun Lin , Chun-Xun Lin , Yibo Lin

Iterative Alignment Flows

The unsupervised task of aligning two or more distributions in a shared latent space has many applications including fair representations, batch effect mitigation, and unsupervised domain adaptation. Existing flow-based approaches estimate…

Machine Learning · Computer Science 2022-03-17 Zeyu Zhou , Ziyu Gong , Pradeep Ravikumar , David I. Inouye

Transparent Synchronous Dataflow

Dataflow programming is a popular and convenient programming paradigm in systems modelling, optimisation, and machine learning. It has a number of advantages, for instance the lacks of control flow allows computation to be carried out in…

Programming Languages · Computer Science 2021-03-03 Steven W. T. Cheung , Dan R. Ghica , Koko Muroya

An iterative method for classification of binary data

In today's data driven world, storing, processing, and gleaning insights from large-scale data are major challenges. Data compression is often required in order to store large amounts of high-dimensional data, and thus, efficient inference…

Machine Learning · Statistics 2018-09-11 Denali Molitor , Deanna Needell

Optimizing ETL Dataflow Using Shared Caching and Parallelization Methods

Extract-Transform-Load (ETL) handles large amount of data and manages workload through dataflows. ETL dataflows are widely regarded as complex and expensive operations in terms of time and system resources. In order to minimize the time and…

Databases · Computer Science 2014-09-08 Xiufeng Liu

An Approach for Accelerating Incompressible Turbulent Flow Simulations Based on Simultaneous Modelling of Multiple Ensembles

The present paper deals with the problem of improving the efficiency of large scale turbulent flow simulations. The high-fidelity methods for modelling turbulent flows become available for a wider range of applications thanks to the…

Computational Physics · Physics 2018-04-10 Boris Krasnopolsky

Uniform-in-Phase-Space Data Selection with Iterative Normalizing Flows

Improvements in computational and experimental capabilities are rapidly increasing the amount of scientific data that is routinely generated. In applications that are constrained by memory and computational intensity, excessively large…

Machine Learning · Computer Science 2023-02-28 Malik Hassanaly , Bruce A. Perry , Michael E. Mueller , Shashank Yellapantula