English
Related papers

Related papers: tf.data service: A Case for Disaggregating ML Inpu…

200 papers

Training machine learning models requires feeding input data for models to ingest. Input pipelines for machine learning jobs are often challenging to implement efficiently as they require reading large volumes of data, applying complex…

Machine Learning · Computer Science 2021-02-25 Derek G. Murray , Jiri Simsa , Ana Klimovic , Ihor Indyk

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors…

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML…

Hardware Architecture · Computer Science 2024-07-12 Mohammed Elbtity , Peyton Chandarana , Ramtin Zand

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

We describe TensorFlow-Serving, a system to serve machine learning models inside Google which is also available in the cloud and via open-source. It is extremely flexible in terms of the types of ML platforms it supports, and ways to…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-12-29 Christopher Olston , Noah Fiedel , Kiril Gorovoy , Jeremiah Harmsen , Li Lao , Fangwei Li , Vinu Rajashekhar , Sukriti Ramesh , Jordan Soyke

Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable---e.g., for rapidly evaluating…

Performance · Computer Science 2019-05-07 Shijian Li , Robert J. Walls , Lijie Xu , Tian Guo

Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM inference consists of a prefill phase and a decode phase. However, existing LLM deployment practices often overlook the distinct…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-23 Cunchen Hu , Heyang Huang , Liangliang Xu , Xusheng Chen , Jiang Xu , Shuang Chen , Hao Feng , Chenxi Wang , Sa Wang , Yungang Bao , Ninghui Sun , Yizhou Shan

Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-05 Steven W. D. Chien , Artur Podobas , Ivy B. Peng , Stefano Markidis

Large-scale deep learning benefits from an emerging class of AI accelerators. Some of these accelerators' designs are general enough for compute-intensive applications beyond AI and Cloud TPU is one such example. In this paper, we…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-19 Kun Yang , Yi-Fan Chen , Georgios Roumpos , Chris Colby , John Anderson

Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…

Hardware Architecture · Computer Science 2025-09-24 Hanchen Ye , Deming Chen

The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-10 Steven W. D. Chien , Stefano Markidis , Chaitanya Prasad Sishtla , Luis Santos , Pawel Herman , Sai Narasimhamurthy , Erwin Laure

Machine learning (ML) is successful in achieving human-level performance in various fields. However, it lacks the ability to explain an outcome due to its black-box nature. While existing explainable ML is promising, almost all of these…

Machine Learning · Computer Science 2021-03-23 Zhixin Pan , Prabhat Mishra

Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal…

Machine Learning · Computer Science 2026-04-15 Chaoyao Shen , Linfeng Jiang , Yixian Shen , Tao Xu , Guoqing Li , Anuj Pathania , Andy D. Pimentel , Meng Zhang

While accelerators such as GPUs have limited memory, deep neural networks are becoming larger and will not fit with the memory limitation of accelerators for training. We propose an approach to tackle this problem by rewriting the…

Machine Learning · Computer Science 2019-10-03 Tung D. Le , Haruki Imai , Yasushi Negishi , Kiyokuni Kawachiya

Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but also introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators.…

Machine Learning · Computer Science 2025-04-11 Shaoyuan Chen , Wencong Xiao , Yutong Lin , Mingxing Zhang , Yingdi Shan , Jinlei Jiang , Kang Chen , Yongwei Wu

Data preprocessing is a fundamental part of any machine learning application and frequently the most time-consuming aspect when developing a machine learning solution. Preprocessing for deep learning is characterized by pipelines that…

Machine Learning · Computer Science 2018-01-11 S. Maetschke , R. Tennakoon , C. Vecchiola , R. Garnavi

DistServe improves the performance of large language models (LLMs) serving by disaggregating the prefill and decoding computation. Existing LLM serving systems colocate the two phases and batch the computation of prefill and decoding across…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-07 Yinmin Zhong , Shengyu Liu , Junda Chen , Jianbo Hu , Yibo Zhu , Xuanzhe Liu , Xin Jin , Hao Zhang

This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the…

Databases · Computer Science 2019-06-21 Maximilian Schleich , Dan Olteanu , Mahmoud Abo Khamis , Hung Q. Ngo , XuanLong Nguyen

TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML)…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-03 Steven W. D. Chien , Stefano Markidis , Vyacheslav Olshevsky , Yaroslav Bulatov , Erwin Laure , Jeffrey S. Vetter

Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on a set of parameters (e.g., hyper-parameter tuning) and model architecture (e.g., neural architecture…

Machine Learning · Computer Science 2025-08-04 Ties Robroek , Neil Kim Nielsen , Pınar Tözün
‹ Prev 1 2 3 10 Next ›