English
Related papers

Related papers: DGEMM performance is data-dependent

200 papers

GPUs are known to be power-hungry, and due to the boom in artificial intelligence, they are currently the major contributors to the high power demands of upcoming datacenters. Most GPU usage in these popular workloads consist of large…

Artificial Intelligence · Computer Science 2024-09-30 Theo Gregersen , Pratyush Patel , Esha Choukse

The importance of low power consumption is widely acknowledged due to the increasing use of portable devices, which require minimizing the consumption of energy. The energy in a computational system depends heavily on the software being…

Adaptation and Self-Organizing Systems · Physics 2024-04-15 Kostas Zotos , Andreas Litke , Alexander Chatzigeorgiou , Spyros Nikolaidis , George Stephanides

Power is increasingly becoming a limiting resource in high-performance, GPU-accelerated computing systems. Understanding the range and sources of power variation is essential in setting realistic bounds on rack and system peak power, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-20 Sridutt Bhalachandra , Brian Austin , Samuel Williams , Nicholas J. Wright

The devices designed for the Internet-of-Things encompass a large variety of distinct processor architectures, forming a highly heterogeneous zoo. In order to tackle this, we employ a simulator to estimate the performance of the…

Hardware Architecture · Computer Science 2024-03-13 Cristian Ramírez , Adrián Castelló , Héctor Martínez , Enrique S. Quintana-Ortí

Recent architectures integrate high-performance and power-efficient matrix engines. These engines demonstrate remarkable performance in low-precision matrix multiplication, which is crucial in deep learning. Several techniques have been…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-13 Yuki Uchino , Katsuhisa Ozaki , Toshiyuki Imamura

One of the most important and commonly used operations in many linear algebra functions is matrix-matrix multiplication (GEMM), which is also a key component in obtaining high performance of many scientific codes. It is a computationally…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-18 Nenad Mijić , Davor Davidović

The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power…

Machine Learning · Computer Science 2024-08-21 Ruiqi Sun , Siwei Ye , Jie Zhao , Xin He , Jianzhe Lin , Yiran Li , An Zou

The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based…

Mathematical Software · Computer Science 2015-05-30 Davide Anastasia , Yiannis Andreopoulos

Machine learning inference is increasingly being executed locally on mobile and embedded platforms, due to the clear advantages in latency, privacy and connectivity. In this paper, we present approaches for online resource management in…

Computer Vision and Pattern Recognition · Computer Science 2021-05-11 Lei Xun , Long Tran-Thanh , Bashir M Al-Hashimi , Geoff V. Merrett

DL inference queries play an important role in diverse internet services and a large fraction of datacenter cycles are spent on processing DL inference queries. Specifically, the matrix-matrix multiplication (GEMM) operations of…

Hardware Architecture · Computer Science 2020-12-02 Benjamin Y. Cho , Jeageun Jung , Mattan Erez

General matrix multiplication (GEMM) is a ubiquitous computing kernel/algorithm for data processing in diverse applications, including artificial intelligence (AI) and deep learning (DL). Recent shift towards edge computing has inspired…

Hardware Architecture · Computer Science 2024-12-25 Harideep Nair , Prabhu Vellaisamy , Albert Chen , Joseph Finn , Anna Li , Manav Trivedi , John Paul Shen

General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-21 Qiao Zhang , Rabab Alomairy , Dali Wang , Zhuowei Gu , Qinglei Cao

The Matrix Element Method (MEM) is a powerful method to extract information from measured events at collider experiments. Compared to multivariate techniques built on large sets of experimental data, the MEM does not rely on an…

High Energy Physics - Experiment · Physics 2021-04-07 Florian Bury , Christophe Delaere

General Matrix Multiplication (GEMM) has a wide range of applications in scientific simulation and artificial intelligence. Although traditional libraries can achieve high performance on large regular-shaped GEMMs, they often behave not…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-12 Shangfei Yin , Qinglin Wang , Ruochen Hao , Tianyang Zhou , Songzhu Mei , Jie Liu

GEneral Matrix Multiply (GEMM) is a central operation in deep learning and corresponds to the largest chunk of the compute footprint. Therefore, improving its efficiency is an active topic of ongoing research. A popular strategy is the use…

Machine Learning · Computer Science 2024-03-13 Zhanpeng Zeng , Karthikeyan Sankaralingam , Vikas Singh

Emerging deep learning workloads urgently need fast general matrix multiplication (GEMM). To meet such demand, one of the critical features of machine-learning-specific accelerators such as NVIDIA Tensor Cores, AMD Matrix Cores, and Google…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-13 Bo Fang , Xinyi Li , Harvey Dam , Cheng Tan , Siva Kumar Sastry Hari , Timothy Tsai , Ignacio Laguna , Dingwen Tao , Ganesh Gopalakrishnan , Prashant Nair , Kevin Barker , Ang Li

Digital MemComputing machines (DMMs), which employ nonlinear dynamical systems with memory (time non-locality), have proven to be a robust and scalable unconventional computing approach for solving a wide variety of combinatorial…

Emerging Technologies · Computer Science 2024-07-16 Yuan-Hang Zhang , Massimiliano Di Ventra

The remarkable positive impact of Deep Neural Networks on many Artificial Intelligence (AI) tasks has led to the development of various high performance algorithms as well as specialized processors and accelerators. In this paper we address…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-16 Jie Lei , José Flich , Enrique S. Quintana-Ortí

General Matrix Multiplication (GEMM) is a fundamental operation widely used in scientific computations. Its performance and accuracy significantly impact the performance and accuracy of applications that depend on it. One such application…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-12 Fumiya Kono , Naohito Nakasato , Maho Nakata

General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as machine learning and scientific computing, and an efficient GEMM implementation is essential for the performance of these systems. While…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-03 Shixun Wu , Yujia Zhai , Jinyang Liu , Jiajun Huang , Zizhe Jian , Bryan M. Wong , Zizhong Chen
‹ Prev 1 2 3 10 Next ›