English
Related papers

Related papers: GPU Activity Prediction using Representation Learn…

200 papers

Graphics processing units (GPUs) are the de facto standard for processing deep learning (DL) tasks. Meanwhile, GPU failures, which are inevitable, cause severe consequences in DL tasks: they disrupt distributed trainings, crash inference…

Machine Learning · Computer Science 2022-01-31 Heting Liu , Zhichao Li , Cheng Tan , Rongqiu Yang , Guohong Cao , Zherui Liu , Chuanxiong Guo

Deep learning researchers and practitioners usually leverage GPUs to help train their deep neural networks (DNNs) faster. However, choosing which GPU to use is challenging both because (i) there are many options, and (ii) users grapple with…

Machine Learning · Computer Science 2021-06-09 Geoffrey X. Yu , Yubo Gao , Pavel Golikov , Gennady Pekhimenko

A key property of neural networks (both biological and artificial) is how they learn to represent and manipulate input information in order to solve a task. Different types of representations may be suited to different types of tasks,…

Machine Learning · Computer Science 2023-07-18 Ryan Pyle , Sebastian Musslick , Jonathan D. Cohen , Ankit B. Patel

Collocating deep learning training tasks improves GPU utilization but risks resource contention, severe slowdowns, and out-of-memory (OOM) failures. Accurate memory estimation is essential for robust collocation, and GPU utilization…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-29 Ehsan Yousefzadeh-Asl-Miandoab , Reza Karimzadeh , Danyal Yorulmaz , Bulat Ibragimov , Pınar Tözün

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

In this work, we survey the role of GPUs in real-time systems. Originally designed for parallel graphics workloads, GPUs are now widely used in time-critical applications such as machine learning, autonomous vehicles, and robotics due to…

GPUs are vastly underutilized, even when running resource-intensive AI applications, as GPU kernels within each job have diverse resource profiles that may saturate some parts of a device while often leaving other parts idle. Colocating…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-17 Paul Elvinger , Foteini Strati , Natalie Enright Jerger , Ana Klimovic

The use of GPUs has proliferated for machine learning workflows and is now considered mainstream for many deep learning models. Meanwhile, when training state-of-the-art personal recommendation models, which consume the highest number of…

Hardware Architecture · Computer Science 2020-11-12 Bilge Acun , Matthew Murphy , Xiaodong Wang , Jade Nie , Carole-Jean Wu , Kim Hazelwood

Heterogeneous graph neural networks (HGNNs) deliver powerful capacity in heterogeneous graph representation learning. The execution of HGNNs is usually accelerated by GPUs. Therefore, characterizing and understanding the execution pattern…

Hardware Architecture · Computer Science 2022-08-10 Mingyu Yan , Mo Zou , Xiaocheng Yang , Wenming Li , Xiaochun Ye , Dongrui Fan , Yuan Xie

We devise a performance model for GPU training of Deep Learning Recommendation Models (DLRM), whose GPU utilization is low compared to other well-optimized CV and NLP models. We show that both the device active time (the sum of kernel…

Machine Learning · Computer Science 2022-11-18 Zhongyi Lin , Louis Feng , Ehsan K. Ardestani , Jaewon Lee , John Lundell , Changkyu Kim , Arun Kejariwal , John D. Owens

For efficient and safe autonomous driving, it is essential that autonomous vehicles can predict the motion of other traffic agents. While highly accurate, current motion prediction models often impose significant challenges in terms of…

Robotics · Computer Science 2024-09-26 Alexander Prutsch , Horst Bischof , Horst Possegger

Real time processing for teamwork action recognition is a challenge, due to complex computational models to achieve high system performance. Hence, this paper proposes a framework based on Graphical Processing Units (GPUs) to achieve a…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-10-15 Mohamed Elhoseiny , Hossam Faheem , Taymour Nazmy , Eman Shaaban

In this paper, we propose a method for activity recognition from videos based on sparse local features and hypergraph matching. We benefit from special properties of the temporal domain in the data to derive a sequential and fast graph…

Computer Vision and Pattern Recognition · Computer Science 2015-05-05 Eric Lombardi , Christian Wolf , Oya Celiktutan , Bülent Sankur

Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained suggestions at the kernel level, if any. In this paper, we…

Performance · Computer Science 2020-11-25 Keren Zhou , Xiaozhu Meng , Ryuichi Sai , John Mellor-Crummey

Graphics Processing Units (GPUs) have become an integral part of High-Performance Computing to achieve an Exascale performance. The main goal of application developers of GPU is to tune their code extensively to obtain optimal performance,…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-04 Gargi Alavani , Santonu Sarkar

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…

Machine Learning · Computer Science 2022-10-25 xiangyang Ju , Yunsong Wang , Daniel Murnane , Nicholas Choma , Steven Farrell , Paolo Calafiura

We present a versatile GPU-based parallel version of Logistic Regression (LR), aiming to address the increasing demand for faster algorithms in binary classification due to large data sets. Our implementation is a direct translation of the…

Machine Learning · Computer Science 2023-08-22 Nechba Mohammed , Mouhajir Mohamed , Sedjari Yassine

To leverage the power of big data from source tasks and overcome the scarcity of the target task samples, representation learning based on multi-task pretraining has become a standard approach in many applications. However, up until now,…

Machine Learning · Computer Science 2022-02-03 Yifang Chen , Simon S. Du , Kevin Jamieson

Graphics Processing Unit, or GPUs, have been successfully adopted both for graphic computation in 3D applications, and for general purpose application (GP-GPUs), thank to their tremendous performance-per-watt. Recently, there is a big…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-03 Paolo Burgio

GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet…

Operating Systems · Computer Science 2025-08-13 Jiarong Xing , Yifan Qiao , Simon Mo , Xingqi Cui , Gur-Eyal Sela , Yang Zhou , Joseph Gonzalez , Ion Stoica
‹ Prev 1 2 3 10 Next ›