English
Related papers

Related papers: A Learned Performance Model for Tensor Processing …

200 papers

Precise hardware performance models play a crucial role in code optimizations. They can assist compilers in making heuristic decisions or aid autotuners in identifying the optimal configuration for a given program. For example, the…

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

GPUs have been favored for training deep learning models due to their highly parallelized architecture. As a result, most studies on training optimization focus on GPUs. There is often a trade-off, however, between cost and efficiency when…

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…

Machine Learning · Computer Science 2022-10-25 xiangyang Ju , Yunsong Wang , Daniel Murnane , Nicholas Choma , Steven Farrell , Paolo Calafiura

We propose a generic algorithmic building block to accelerate training of machine learning models on heterogeneous compute systems. Our scheme allows to efficiently employ compute accelerators such as GPUs and FPGAs for the training of…

Machine Learning · Computer Science 2017-11-08 Celestine Dünner , Thomas Parnell , Martin Jaggi

Tensor processing units (TPUs) are one of the most well-known machine learning (ML) accelerators utilized at large scale in data centers as well as in tiny ML applications. TPUs offer several improvements and advantages over conventional ML…

Hardware Architecture · Computer Science 2024-07-12 Mohammed Elbtity , Peyton Chandarana , Ramtin Zand

Tensor Processing Units (TPUs) are specialized hardware accelerators for deep learning developed by Google. This paper aims to explore TPUs in cloud and edge computing focusing on its applications in AI. We provide an overview of TPUs,…

Hardware Architecture · Computer Science 2023-11-15 Diego Sanmartín Carrión , Vera Prohaska

High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging.…

The use of GPUs has proliferated for machine learning workflows and is now considered mainstream for many deep learning models. Meanwhile, when training state-of-the-art personal recommendation models, which consume the highest number of…

Hardware Architecture · Computer Science 2020-11-12 Bilge Acun , Matthew Murphy , Xiaodong Wang , Jade Nie , Carole-Jean Wu , Kim Hazelwood

Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions. Traditionally, these models assume that data distribution does not change as more samples are…

Machine Learning · Computer Science 2023-10-27 Ray A. O. Sinurat , Anurag Daram , Haryadi S. Gunawi , Robert B. Ross , Sandeep Madireddy

To respond to the need of efficient training and inference of deep neural networks, a plethora of domain-specific hardware architectures have been introduced, such as Google Tensor Processing Units and NVIDIA Tensor Cores. A common feature…

Data Structures and Algorithms · Computer Science 2020-07-10 Rezaul Chowdhury , Francesco Silvestri , Flavio Vella

The deployment of convolutional neural networks is often hindered by high computational and storage requirements. Structured model pruning is a promising approach to alleviate these requirements. Using the VGG-16 model as an example, we…

Machine Learning · Computer Science 2021-07-22 Kongtao Chen , Ken Franko , Ruoxin Sang

While GPUs are responsible for training the vast majority of state-of-the-art deep learning models, the implications of their architecture are often overlooked when designing new deep learning (DL) models. As a consequence, modifying a DL…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-02-01 Quentin Anthony , Jacob Hatef , Deepak Narayanan , Stella Biderman , Stas Bekman , Junqi Yin , Aamir Shafi , Hari Subramoni , Dhabaleswar Panda

Optimizing the performance of computational fluid dynamics (CFD) applications accelerated by graphics processing units (GPUs) is crucial for efficient simulations. In this study, we employed a machine learning-based autotuning technique to…

Performance · Computer Science 2024-02-21 Weicheng Xue , Christohper John Roy

Implementing Deep Neural Networks (DNNs) on resource-constrained edge devices is a challenging task that requires tailored hardware accelerator architectures and a clear understanding of their performance characteristics when executing the…

The increasing complexity and scale of Deep Neural Networks (DNNs) necessitate specialized tensor accelerators, such as Tensor Processing Units (TPUs), to meet various computational and energy efficiency requirements. Nevertheless,…

Hardware Architecture · Computer Science 2025-03-11 Deepak Vungarala , Mohammed E. Elbtity , Sumiya Syed , Sakila Alam , Kartik Pandit , Arnob Ghosh , Ramtin Zand , Shaahin Angizi

Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-27 Zhongyi Lin , Ning Sun , Pallab Bhattacharya , Xizhou Feng , Louis Feng , John D. Owens

Deep learning models are yielding increasingly better performances thanks to multiple factors. To be successful, model may have large number of parameters or complex architectures and be trained on large dataset. This leads to large…

Machine Learning · Computer Science 2022-12-20 Jean-Roch Vlimant , Junqi Yin

Performance models can be very useful for understanding the behavior of applications and hence can help guide design and optimization decisions. Unfortunately, performance modeling of nontrivial computations typically requires significant…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-18 Kewen Meng , Boyana Norris

Language models are now prevalent in software engineering with many developers using them to automate tasks and accelerate their development. While language models have been tremendous at accomplishing complex software engineering tasks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Daniel Nichols , Konstantinos Parasyris , Charles Jekel , Abhinav Bhatele , Harshitha Menon
‹ Prev 1 2 3 10 Next ›