English
Related papers

Related papers: Cache Bypassing for Machine Learning Algorithms

200 papers

General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-10 Vajira Thambawita , Roshan G. Ragel , Dhammike Elkaduwe

Graph neural networks (GNNs) have extended the success of deep neural networks (DNNs) to non-Euclidean graph data, achieving ground-breaking performance on various tasks such as node classification and graph property prediction.…

Machine Learning · Computer Science 2021-12-17 Tianfeng Liu , Yangrui Chen , Dan Li , Chuan Wu , Yibo Zhu , Jun He , Yanghua Peng , Hongzheng Chen , Hongzhi Chen , Chuanxiong Guo

Modeling data sharing in GPU programs is a challenging task because of the massive parallelism and complex data sharing patterns provided by GPU architectures. Better GPU caching efficiency can be achieved through careful task scheduling…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-04 Lingda Li , Ari B. Hayes , Stephen A. Hackler , Eddy Z. Zhang , Mario Szegedy , Shuaiwen Leon Song

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

Cache replacement algorithms are used to optimize the time taken by processor to process the information by storing the information needed by processor at that time and possibly in future so that if processor needs that information, it can…

Data Structures and Algorithms · Computer Science 2021-08-02 Sarwan Ali

Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. Due to the substantial compute and memory operations, however, they require significant execution time. The massive…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-10-13 Chao Li , Yi Yang , Min Feng , Srimat Chakradhar , Huiyang Zhou

Graph processing is typically considered to be a memory-bound rather than compute-bound problem. One common line of thought is that more available memory bandwidth corresponds to better graph processing performance. However, in this work we…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-10 Oded Green , James Fox , Jeffrey Young , Jun Shirako , David Bader

There has been significant progress in developing neural network architectures that both achieve high predictive performance and that also achieve high application-level inference throughput (e.g., frames per second). Another metric of…

Machine Learning · Computer Science 2022-12-16 Jack Kosaian , Amar Phanishayee

Caching and prefetching techniques are fundamental to modern computing, serving to bridge the growing performance gap between processors and memory. Traditional prefetching strategies are often limited by their reliance on predefined…

Performance · Computer Science 2025-10-28 F. I. Qowy

Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models…

Machine Learning · Computer Science 2020-02-21 Valentin Radu , Kuba Kaszyk , Yuan Wen , Jack Turner , Jose Cano , Elliot J. Crowley , Bjorn Franke , Amos Storkey , Michael O'Boyle

In recent years, machine intelligence (MI) applications have emerged as a major driver for the computing industry. Optimizing these workloads is important but complicated. As memory demands grow and data movement overheads increasingly…

Many artificial intelligence (AI) devices have been developed to accelerate the training and inference of neural networks models. The most common ones are the Graphics Processing Unit (GPU) and Tensor Processing Unit (TPU). They are highly…

Machine Learning · Computer Science 2022-10-25 xiangyang Ju , Yunsong Wang , Daniel Murnane , Nicholas Choma , Steven Farrell , Paolo Calafiura

Video and image streaming on edge devices requires low latency. To address this, Neural Networks (NNs) are widely used, and prior work mainly focuses on accelerating them with single hardware units such as Graphics Processing Units (GPUs),…

Hardware Architecture · Computer Science 2026-05-04 Ali Emre Oztas , Mahir Demir , James Garside , Mikel Luj'an

Graph Neural Networks (GNNs) are a powerful tool for handling structured graph data and addressing tasks such as node classification, graph classification, and clustering. However, the sparse nature of GNN computation poses new challenges…

Machine Learning · Computer Science 2023-08-24 Julia Bazinska , Andrei Ivanov , Tal Ben-Nun , Nikoli Dryden , Maciej Besta , Siyuan Shen , Torsten Hoefler

Memory access efficiency is a key factor in fully utilizing the computational power of graphics processing units (GPUs). However, many details of the GPU memory hierarchy are not released by GPU vendors. In this paper, we propose a novel…

Hardware Architecture · Computer Science 2016-03-15 Xinxin Mei , Xiaowen Chu

Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-31 Jeongmin Brian Park , Kun Wu , Vikram Sharma Mailthody , Zaid Quresh , Scott Mahlke , Wen-mei Hwu

Recurrent Neural Networks (RNNs) are powerful tools for solving sequence-based problems, but their efficacy and execution time are dependent on the size of the network. Following recent work in simplifying these networks with model pruning…

Neural and Evolutionary Computing · Computer Science 2018-04-30 Feiwen Zhu , Jeff Pool , Michael Andersch , Jeremy Appleyard , Fung Xie

Learning an algorithm from examples is a fundamental problem that has been widely studied. Recently it has been addressed using neural networks, in particular by Neural Turing Machines (NTMs). These are fully differentiable computers that…

Machine Learning · Computer Science 2016-03-16 Łukasz Kaiser , Ilya Sutskever

Graph neural networks (GNNs) are powerful tools for learning from graph data and are widely used in various applications such as social network recommendation, fraud detection, and graph search. The graphs in these applications are…

Machine Learning · Computer Science 2021-06-14 Jialin Dong , Da Zheng , Lin F. Yang , Geroge Karypis

Graph Neural Networks (GNNs) have shown success in learning from graph-structured data, with applications to fraud detection, recommendation, and knowledge graph reasoning. However, training GNN efficiently is challenging because: 1) GPU…

Machine Learning · Computer Science 2021-11-12 Seung Won Min , Kun Wu , Mert Hidayetoğlu , Jinjun Xiong , Xiang Song , Wen-mei Hwu
‹ Prev 1 2 3 10 Next ›