English
Related papers

Related papers: A Tool for Automatically Suggesting Source-Code Op…

200 papers

We describe GPU implementations of the matrix recommender algorithms CCD++ and ALS. We compare the processing time and predictive ability of the GPU implementations with existing multi-core versions of the same algorithms. Results on the…

Information Retrieval · Computer Science 2015-11-10 André Valente Rodrigues , Alípio Jorge , Inês Dutra

This work deals with the optimization of computer programs targeting Graphics Processing Units (GPUs). The goal is to lift, from programmers to optimizing compilers, the heavy burden of determining program details that are dependent on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-01-16 Xiaohui Chen , Marc Moreno-Maza , Jeeva Paudel , Ning Xie

Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained suggestions at the kernel level, if any. In this paper, we…

Performance · Computer Science 2020-11-25 Keren Zhou , Xiaozhu Meng , Ryuichi Sai , John Mellor-Crummey

Graphics processing units (GPU) had evolved from a specialized hardware capable to render high quality graphics in games to a commodity hardware for effective processing blocks of data in a parallel schema. This evolution is particularly…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-03-26 Luis Cabellos

Hardware accelerators, such as those based on GPUs and FPGAs, offer an excellent opportunity to efficiently parallelize functionalities. Recently, modern embedded platforms started being equipped with such accelerators, resulting in a…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-16 Daniel Casini , Paolo Pazzaglia , Alessandro Biondi , Marco Di Natale

In this survey paper, we review recent work on frameworks for the high-level, portable programming of heterogeneous multi-/manycore systems (especially, GPU-based systems) using high-level constructs such as annotated user-level software…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-05-14 Christoph Kessler , Usman Dastgeer , Lu Li

The use of GPUs has proliferated for machine learning workflows and is now considered mainstream for many deep learning models. Meanwhile, when training state-of-the-art personal recommendation models, which consume the highest number of…

Hardware Architecture · Computer Science 2020-11-12 Bilge Acun , Matthew Murphy , Xiaodong Wang , Jade Nie , Carole-Jean Wu , Kim Hazelwood

Heterogeneous computing systems, which combine general-purpose processors with specialized accelerators, are increasingly important for optimizing the performance of modern applications. A central challenge is to decide which parts of an…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-15 Martin Wilhelm , Franz Freitag , Max Tzschoppe , Thilo Pionteck

Language models are now prevalent in software engineering with many developers using them to automate tasks and accelerate their development. While language models have been tremendous at accomplishing complex software engineering tasks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-21 Daniel Nichols , Konstantinos Parasyris , Charles Jekel , Abhinav Bhatele , Harshitha Menon

The growing complexity of computational workloads has amplified the need for efficient and specialized hardware accelerators. Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) have emerged as prominent solutions,…

Hardware Architecture · Computer Science 2025-11-11 Arnab A Purkayastha , Jay Tharwani , Shobhit Aggarwal

In this work, we survey the role of GPUs in real-time systems. Originally designed for parallel graphics workloads, GPUs are now widely used in time-critical applications such as machine learning, autonomous vehicles, and robotics due to…

Due to their highly parallel multi-cores architecture, GPUs are being increasingly used in a wide range of computationally intensive applications. Compared to CPUs, GPUs can achieve higher performances at accelerating the programs'…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-05 Frédéric Magoulès , Abal-Kassim Cheik Ahamed , Alban Desmaison , Jean-Christophe Léchenet , François Mayer , Haifa Ben Salem , Thomas Zhu

Program synthesis is an umbrella term for generating programs and logical formulae from specifications. With the remarkable performance improvements that GPUs enable for deep learning, a natural question arose: can we also implement a…

Programming Languages · Computer Science 2025-04-29 Martin Berger , Nathanaël Fijalkow , Mojtaba Valizadeh

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-07-20 Chao Chen , Chris Porter , Santosh Pande

This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many…

Computation · Statistics 2015-03-13 Hua Zhou , Kenneth Lange , Marc A. Suchard

The most important way to achieve higher performance in computer systems is through heterogeneous computing, i.e., by adopting hardware platforms containing more than one type of processor, such as CPUs, GPUs, and FPGAs. Several types of…

Software Engineering · Computer Science 2020-05-19 Hugo Andrade , Ivica Crnkovic , Jan Bosch

We provide a preliminary study on utilizing GPU (Graphics Processing Unit) to accelerate computation for three simulation optimization tasks with either first-order or second-order algorithms. Compared to the implementation using only CPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-19 Jinghai He , Haoyu Liu , Yuhang Wu , Zeyu Zheng , Tingyu Zhu

Many emerging cyber-physical systems, such as autonomous vehicles and robots, rely heavily on artificial intelligence and machine learning algorithms to perform important system operations. Since these highly parallel applications are…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-07 An Zou , Jing Li , Christopher D. Gill , Xuan Zhang

Efficient parallelization of algorithms on general-purpose GPUs is essential in many areas today. However, it is a non-trivial task for software engineers to utilize GPUs to improve the performance of high-level programs in general.…

Programming Languages · Computer Science 2024-07-09 Lars Hummelgren , John Wikman , Oscar Eriksson , Philipp Haller , David Broman

Graphics Processing Units (GPUs) have revolutionized the computing landscape over the past decade. However, the growing energy demands of data centres and computing facilities equipped with GPUs come with significant capital and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-11-15 Richard Schoonhoven , Bram Veenboer , Ben van Werkhoven , Kees Joost Batenburg
‹ Prev 1 2 3 10 Next ›