Related papers: DeepFlow: A Cross-Stack Pathfinding Framework for …

VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware

State-of-the-art deep learning systems such as TensorFlow and PyTorch tightly couple the model with the underlying hardware. This coupling requires the user to modify application logic in order to run the same job across a different set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-13 Andrew Or , Haoyu Zhang , Michael J. Freedman

MatrixFlow: System-Accelerator co-design for high-performance transformer applications

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza

TensorFlow: A system for large-scale machine learning

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-01 Martín Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , Xiaoqiang Zheng

Scalable, Distributed AI Frameworks: Leveraging Cloud Computing for Enhanced Deep Learning Performance and Efficiency

In recent years, the integration of artificial intelligence (AI) and cloud computing has emerged as a promising avenue for addressing the growing computational demands of AI applications. This paper presents a comprehensive study of…

Machine Learning · Computer Science 2023-04-28 Neelesh Mungoli

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-03-17 Martín Abadi , Ashish Agarwal , Paul Barham , Eugene Brevdo , Zhifeng Chen , Craig Citro , Greg S. Corrado , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Ian Goodfellow , Andrew Harp , Geoffrey Irving , Michael Isard , Yangqing Jia , Rafal Jozefowicz , Lukasz Kaiser , Manjunath Kudlur , Josh Levenberg , Dan Mane , Rajat Monga , Sherry Moore , Derek Murray , Chris Olah , Mike Schuster , Jonathon Shlens , Benoit Steiner , Ilya Sutskever , Kunal Talwar , Paul Tucker , Vincent Vanhoucke , Vijay Vasudevan , Fernanda Viegas , Oriol Vinyals , Pete Warden , Martin Wattenberg , Martin Wicke , Yuan Yu , Xiaoqiang Zheng

DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training

Reinforcement learning (RL) has become the pivotal post-training technique for large language model (LLM). Effectively scaling reinforcement learning is now the key to unlocking advanced reasoning capabilities and ensuring safe,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-10 Zhixin Wang , Tianyi Zhou , Liming Liu , Ao Li , Jiarui Hu , Dian Yang , Yinhui Lu , Jinlong Hou , Siyuan Feng , Yuan Cheng , Yuan Qi

DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. As LLMs scale to hundreds of billions or trillions of parameters, distributed inference across…

Hardware Architecture · Computer Science 2026-04-10 Zhiwen Mo , Guoyu Li , Hao Mark Chen , Yu Cheng , Zhengju Tang , Qianzhou Wang , Lei Wang , Shuang Liang , Lingxiao Ma , Xianqi Zhou , Yuxiao Guo , Wayne Luk , Jilong Xue , Hongxiang Fan

SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation

SmartFlow is a multi-layered framework that integrates Reinforcement Learning and Agentic AI to address the dynamic rebalancing problem in urban bike-sharing services. Its architecture separates strategic, tactical, and communication…

Machine Learning · Computer Science 2026-01-06 Aditya Sreevatsa K , Arun Kumar Raveendran , Jesrael K Mani , Prakash G Shigli , Rajkumar Rangadore , Narayana Darapaneni , Anwesh Reddy Paduri

AAFLOW: Scalable Patterns for Agentic AI Workflows

Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Arup Kumar Sarker , Mills Staylor , Aymen Alsaadi , Gregor von Laszewski , Shantenu Jha , Geoffrey Fox

MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era

The rapid development of interactive and autonomous AI systems signals our entry into the agentic era. Training and evaluating agents on complex agentic tasks such as software engineering and computer use requires not only efficient model…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-14 Lei Zhang , Mouxiang Chen , Ruisheng Cao , Jiawei Chen , Fan Zhou , Yiheng Xu , Jiaxi Yang , Zeyao Ma , Liang Chen , Changwei Luo , Kai Zhang , Fan Yan , KaShun Shum , Jiajun Zhang , Zeyu Cui , Feng Hu , Junyang Lin , Binyuan Hui , Min Yang

Flow State: Humans Enabling AI Systems to Program Themselves

Compound AI systems, orchestrating multiple AI components and external APIs, are increasingly vital but face challenges in managing complexity, handling ambiguity, and enabling effective development workflows. Existing frameworks often…

Artificial Intelligence · Computer Science 2025-04-08 Helena Zhang , Jakobi Haskell , Yosef Frost

AI Flow at the Network Edge

Recent advancements in large language models (LLMs) and their multimodal variants have led to remarkable progress across various domains, demonstrating impressive capabilities and unprecedented potential. In the era of ubiquitous…

Signal Processing · Electrical Eng. & Systems 2025-02-14 Jiawei Shao , Xuelong Li

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of…

Cosmology and Nongalactic Astrophysics · Physics 2018-11-13 Amrita Mathuriya , Deborah Bard , Peter Mendygral , Lawrence Meadows , James Arnemann , Lei Shao , Siyu He , Tuomas Karna , Daina Moise , Simon J. Pennycook , Kristyn Maschoff , Jason Sewall , Nalini Kumar , Shirley Ho , Mike Ringenburg , Prabhat , Victor Lee

Cross-Stack Workload Characterization of Deep Recommendation Systems

Deep learning based recommendation systems form the backbone of most personalized cloud services. Though the computer architecture community has recently started to take notice of deep recommendation inference, the resulting solutions have…

Hardware Architecture · Computer Science 2020-10-13 Samuel Hsia , Udit Gupta , Mark Wilkening , Carole-Jean Wu , Gu-Yeon Wei , David Brooks

Hyper: Distributed Cloud Processing for Large-Scale Deep Learning Tasks

Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-17 Davit Buniatyan

MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy,…

Hardware Architecture · Computer Science 2026-02-11 Zhiqiang Que , Jose G. F. Coutinho , Ce Guo , Hongxiang Fan , Wayne Luk

XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification

In recent years, there have been numerous developments towards solving multimodal tasks, aiming to learn a stronger representation than through a single modality. Certain aspects of the data can be particularly useful in this case - for…

Machine Learning · Statistics 2023-09-06 Cătălina Cangea , Petar Veličković , Pietro Liò

Dynamic Control Flow in Large-Scale Machine Learning

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-09 Yuan Yu , Martín Abadi , Paul Barham , Eugene Brevdo , Mike Burrows , Andy Davis , Jeff Dean , Sanjay Ghemawat , Tim Harley , Peter Hawkins , Michael Isard , Manjunath Kudlur , Rajat Monga , Derek Murray , Xiaoqiang Zheng

A Comparison of Big Data Frameworks on a Layered Dataflow Model

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-17 Claudia Misale , Maurizio Drocco , Marco Aldinucci , Guy Tremblay

Continuous Deep Learning: A Workflow to Bring Models into Production

Researchers have been highly active to investigate the classical machine learning workflow and integrate best practices from the software engineering lifecycle. However, deep learning exhibits deviations that are not yet covered in this…

Software Engineering · Computer Science 2022-08-30 Janosch Baltensperger , Pasquale Salza , Harald C. Gall