English
Related papers

Related papers: Characterizing Optimizations to Memory Access Patt…

200 papers

Measuring performance-critical characteristics of application workloads is important both for developers, who must understand and optimize the performance of codes, as well as designers and integrators of HPC systems, who must ensure that…

Software Engineering · Computer Science 2018-11-01 Beau Johnston , Josh Milthorpe

OpenCL is an attractive model for heterogeneous high-performance computing systems, with wide support from hardware vendors and significant performance portability. To support efficient scheduling on HPC systems it is necessary to perform…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-04 Beau Johnston , Greg Falzon , Josh Milthorpe

Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-19 Stefano Corda , Gagandeep Singh , Ahsan Javed Awan , Roel Jordans , Henk Corporaal

An increasingly large number of HPC systems rely on heterogeneous architectures combining traditional multi-core CPUs with power efficient accelerators. Designing efficient applications for these systems has been troublesome in the past as…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-03-02 E. Calore , A. Gabbana , J. Kraus , S. F. Schifano , R. Tripiccione

For reasons of both performance and energy efficiency, high-performance computing (HPC) hardware is becoming increasingly heterogeneous. The OpenCL framework supports portable programming across a wide range of computing devices and is…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-01 Beau Johnston , Josh Milthorpe

This paper presents the design and analysis of parallel approximation algorithms for facility-location problems, including $\NC$ and $\RNC$ algorithms for (metric) facility location, $k$-center, $k$-median, and $k$-means. These problems…

Data Structures and Algorithms · Computer Science 2010-06-11 Guy E. Blelloch , Kanat Tangwongsan

Software-hardware co-design is essential for optimizing in-memory computing (IMC) hardware accelerators for neural networks. However, most existing optimization frameworks target a single workload, leading to highly specialized hardware…

Hardware Architecture · Computer Science 2026-03-05 Olga Krestinskaya , Mohammed E. Fouda , Ahmed Eltawil , Khaled N. Salama

Near-memory Computing (NMC) promises improved performance for the applications that can exploit the features of emerging memory technologies such as 3D-stacked memory. However, it is not trivial to find such applications and specialized…

Performance · Computer Science 2019-06-26 Stefano Corda , Gagandeep Singh , Ahsan Javed Awan , Roel Jordans , Henk Corporaal

In-Memory Computing (IMC) has emerged as a promising paradigm for energy-efficient, throughput-efficient and area-efficient machine learning at the edge. However, the differences in hardware architectures, array dimensions, and fabrication…

Signal Processing · Electrical Eng. & Systems 2024-05-27 Jiacong Sun , Pouya Houshmand , Marian Verhelst

High-performance computing (HPC) applications are increasingly executed in heterogeneous environments, introducing new challenges for programming and software portability. SYCL has emerged as a leading model designed to simplify…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-20 Ami Marowka

Spatially-Coupled (SC)-LDPC codes are known to have outstanding error-correction performance and low decoding latency. Whereas previous works on LDPC and SC-LDPC codes mostly take either an asymptotic or a finite-length design approach, in…

Information Theory · Computer Science 2022-09-02 Homa Esfahanizadeh , Eshed Ram , Yuval Cassuto , Lara Dolecek

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-11-23 Pekka Jääskeläinen , Carlos Sánchez de La Lama , Erik Schnetter , Kalle Raiskila , Jarmo Takala , Heikki Berg

Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal…

The rapid growth of large-language models (LLMs) is driving a new wave of specialized hardware for inference. This paper presents the first workload-centric, cross-architectural performance study of commercial AI accelerators, spanning…

Hardware Architecture · Computer Science 2025-06-10 Amit Sharma

Large language models (LLMs) are increasingly deployed locally for privacy and accessibility, yet users lack tools to measure their resource usage, environmental impact, and efficiency metrics. This paper presents EnviroLLM, an open-source…

Machine Learning · Computer Science 2025-12-16 Troy Allen

The plethora of complex artificial intelligence (AI) algorithms and available high performance computing (HPC) power stimulates the expeditious development of AI components with heterogeneous designs. Consequently, the need for cross-stack…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-16 Zhixiang Ren , Yongheng Liu , Tianhui Shi , Lei Xie , Yue Zhou , Jidong Zhai , Youhui Zhang , Yunquan Zhang , Wenguang Chen

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-04-19 Suejb Memeti , Lu Li , Sabri Pllana , Joanna Kolodziej , Christoph Kessler

Analog in-memory computing (AIMC) is a promising compute paradigm to improve speed and power efficiency of neural network inference beyond the limits of conventional von Neumann-based architectures. However, AIMC introduces fundamental…

Designing generalized in-memory computing (IMC) hardware that efficiently supports a variety of workloads requires extensive design space exploration, which is infeasible to perform manually. Optimizing hardware individually for each…

Hardware Architecture · Computer Science 2025-02-04 Olga Krestinskaya , Mohammed E. Fouda , Ahmed Eltawil , Khaled N. Salama

In-memory-computing is emerging as an efficient hardware paradigm for deep neural network accelerators at the edge, enabling to break the memory wall and exploit massive computational parallelism. Two design models have surged: analog…

Hardware Architecture · Computer Science 2023-05-31 Pouya Houshmand , Jiacong Sun , Marian Verhelst
‹ Prev 1 2 3 10 Next ›