English
Related papers

Related papers: DeepFlow: A Cross-Stack Pathfinding Framework for …

200 papers

State-of-the-art deep learning systems such as TensorFlow and PyTorch tightly couple the model with the underlying hardware. This coupling requires the user to modify application logic in order to run the same job across a different set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-13 Andrew Or , Haoyu Zhang , Michael J. Freedman

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

In recent years, the integration of artificial intelligence (AI) and cloud computing has emerged as a promising avenue for addressing the growing computational demands of AI applications. This paper presents a comprehensive study of…

Machine Learning · Computer Science 2023-04-28 Neelesh Mungoli

TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous…

Reinforcement learning (RL) has become the pivotal post-training technique for large language model (LLM). Effectively scaling reinforcement learning is now the key to unlocking advanced reasoning capabilities and ensuring safe,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-10 Zhixin Wang , Tianyi Zhou , Liming Liu , Ao Li , Jiarui Hu , Dian Yang , Yinhui Lu , Jinlong Hou , Siyuan Feng , Yuan Cheng , Yuan Qi

Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. As LLMs scale to hundreds of billions or trillions of parameters, distributed inference across…

SmartFlow is a multi-layered framework that integrates Reinforcement Learning and Agentic AI to address the dynamic rebalancing problem in urban bike-sharing services. Its architecture separates strategic, tactical, and communication…

Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-05 Arup Kumar Sarker , Mills Staylor , Aymen Alsaadi , Gregor von Laszewski , Shantenu Jha , Geoffrey Fox

The rapid development of interactive and autonomous AI systems signals our entry into the agentic era. Training and evaluating agents on complex agentic tasks such as software engineering and computer use requires not only efficient model…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-14 Lei Zhang , Mouxiang Chen , Ruisheng Cao , Jiawei Chen , Fan Zhou , Yiheng Xu , Jiaxi Yang , Zeyao Ma , Liang Chen , Changwei Luo , Kai Zhang , Fan Yan , KaShun Shum , Jiajun Zhang , Zeyu Cui , Feng Hu , Junyang Lin , Binyuan Hui , Min Yang

Compound AI systems, orchestrating multiple AI components and external APIs, are increasingly vital but face challenges in managing complexity, handling ambiguity, and enabling effective development workflows. Existing frameworks often…

Artificial Intelligence · Computer Science 2025-04-08 Helena Zhang , Jakobi Haskell , Yosef Frost

Recent advancements in large language models (LLMs) and their multimodal variants have led to remarkable progress across various domains, demonstrating impressive capabilities and unprecedented potential. In the era of ubiquitous…

Signal Processing · Electrical Eng. & Systems 2025-02-14 Jiawei Shao , Xuelong Li

Deep learning is a promising tool to determine the physical model that describes our universe. To handle the considerable computational cost of this problem, we present CosmoFlow: a highly scalable deep learning application built on top of…

Deep learning based recommendation systems form the backbone of most personalized cloud services. Though the computer architecture community has recently started to take notice of deep recommendation inference, the resulting solutions have…

Hardware Architecture · Computer Science 2020-10-13 Samuel Hsia , Udit Gupta , Mark Wilkening , Carole-Jean Wu , Gu-Yeon Wei , David Brooks

Training and deploying deep learning models in real-world applications require processing large amounts of data. This is a challenging task when the amount of data grows to a hundred terabytes, or even, petabyte-scale. We introduce a hybrid…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-17 Davit Buniatyan

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy,…

Hardware Architecture · Computer Science 2026-02-11 Zhiqiang Que , Jose G. F. Coutinho , Ce Guo , Hongxiang Fan , Wayne Luk

In recent years, there have been numerous developments towards solving multimodal tasks, aiming to learn a stronger representation than through a single modality. Certain aspects of the data can be particularly useful in this case - for…

Machine Learning · Statistics 2023-09-06 Cătălina Cangea , Petar Veličković , Pietro Liò

Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-09 Yuan Yu , Martín Abadi , Paul Barham , Eugene Brevdo , Mike Burrows , Andy Davis , Jeff Dean , Sanjay Ghemawat , Tim Harley , Peter Hawkins , Michael Isard , Manjunath Kudlur , Rajat Monga , Derek Murray , Xiaoqiang Zheng

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-17 Claudia Misale , Maurizio Drocco , Marco Aldinucci , Guy Tremblay

Researchers have been highly active to investigate the classical machine learning workflow and integrate best practices from the software engineering lifecycle. However, deep learning exhibits deviations that are not yet covered in this…

Software Engineering · Computer Science 2022-08-30 Janosch Baltensperger , Pasquale Salza , Harald C. Gall
‹ Prev 1 2 3 10 Next ›