English
Related papers

Related papers: Mixed precision in Graphics Processing Unit

200 papers

Matrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in hardware. Due to the increased throughput…

Mathematical Software · Computer Science 2026-04-07 Faizan A. Khattak , Mantas Mikaitis

The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called "Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-18 Stefano Markidis , Steven Wei Der Chien , Erwin Laure , Ivy Bo Peng , Jeffrey S. Vetter

Within the past years, hardware vendors have started designing low precision special function units in response to the demand of the Machine Learning community and their demand for high compute power in low precision formats. Also the…

The use of reduced and mixed precision computing has gained increasing attention in high-performance computing (HPC) as a means to improve computational efficiency, particularly on modern hardware architectures like GPUs. In this work, we…

Computational Engineering, Finance, and Science · Computer Science 2025-05-28 Bálint Siklósi , Pushpender K. Sharma , David J. Lusher , István Z. Reguly , Neil D. Sandham

GPU has a significantly higher performance in single-precision computing than that of double precision. Hence, it is important to take a maximal advantage of the single precision in the CG inverter, using the mixed precision method. We have…

Computational Physics · Physics 2011-11-02 Yong-Chull Jang , Hyung-Jin Kim , Weonjong Lee

Driven by the insatiable needs to process ever larger amount of data with more complex models, modern computer processors and accelerators are beginning to offer half precision floating point arithmetic support, and extremely optimized…

Mathematical Software · Computer Science 2019-12-12 Shaoshuai Zhang , Panruo Wu

General Purpose Graphics Processing Unit (GPGPU) computing plays a transformative role in deep learning and machine learning by leveraging the computational advantages of parallel processing. Through the power of Compute Unified Device…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-20 Ming Li , Ziqian Bi , Tianyang Wang , Yizhu Wen , Qian Niu , Xinyuan Song , Zekun Jiang , Junyu Liu , Benji Peng , Sen Zhang , Xuanhe Pan , Jiawei Xu , Jinlang Wang , Keyu Chen , Caitlyn Heqi Yin , Pohsun Feng , Ming Liu

As CMOS scaling reaches its technological limits, a radical departure from traditional von Neumann systems, which involve separate processing and memory units, is needed in order to significantly extend the performance of today's computers.…

Matrix multiplication is a foundational operation in scientific computing and machine learning, yet its computational complexity makes it a significant bottleneck for large-scale applications. The shift to parallel architectures, primarily…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-30 Mufakir Qamar Ansari , Mudabir Qamar Ansari

In recent years, a new kind of accelerated hardware has gained popularity in the Artificial Intelligence (AI) and Machine Learning (ML) communities which enables extremely high-performance tensor contractions in reduced precision for deep…

Computational Physics · Physics 2024-05-01 Adela Habib , Joshua Finkelstein , Anders M. N. Niklasson

Graphics Processing Unit, or GPUs, have been successfully adopted both for graphic computation in 3D applications, and for general purpose application (GP-GPUs), thank to their tremendous performance-per-watt. Recently, there is a big…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-03 Paolo Burgio

Strong gravitational lensing is a powerful probe of cosmology and the dark matter distribution. Efficient lensing software is already a necessity to fully use its potential and the performance demands will only increase with the upcoming…

Instrumentation and Methods for Astrophysics · Physics 2019-02-12 Markus Rexroth , Christoph Schäfer , Gilles Fourestey , Jean-Paul Kneib

Modern computer architectures support low-precision arithmetic, which present opportunities for the adoption of mixed-precision algorithms to achieve high computational throughput and reduce energy consumption. As a growing number of…

Computation · Statistics 2024-12-02 Sahil Bhola , Karthik Duraisamy

General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-10-10 Vajira Thambawita , Roshan G. Ragel , Dhammike Elkaduwe

The vision of super computer at every desk can be realized by powerful and highly parallel CPUs or GPUs or APUs. Graphics processors once specialized for the graphics applications only, are now used for the highly computational intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-04-16 Chittampally Vasanth Raja , Srinivas Balasubramanian , Prakash S Raghavendra

Many recent computational accelerators provide non-standard (e.g., reduced precision) arithmetic operations to enhance performance for floating-point matrix multiplication. Unfortunately, the properties of these accelerators are not widely…

Hardware Architecture · Computer Science 2025-02-25 Benjamin Valpey , Xinyi Li , Sreepathi Pai , Ganesh Gopalakrishnan

Support for lower precision computation is becoming more common in accelerator hardware due to lower power usage, reduced data movement and increased computational performance. However, computational science and engineering (CSE) problems…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-09-06 Jennifer A. Loe , Christian A. Glusa , Ichitaro Yamazaki , Erik G. Boman , Sivasankaran Rajamanickam

Tensor Core is a mixed-precision matrix-matrix multiplication unit on NVIDIA GPUs with a theoretical peak performance of more than 300 TFlop/s on Ampere architectures. Tensor Cores were developed in response to the high demand of dense…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-19 Hiroyuki Ootomo , Rio Yokota

The Nvidia GPU architecture has introduced new computing elements such as the \textit{tensor cores}, which are special processing units dedicated to perform fast matrix-multiply-accumulate (MMA) operations and accelerate \textit{Deep…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-03-12 Roberto Carrasco , Raimundo Vega , Cristóbal A. Navarro

This paper discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many…

Computation · Statistics 2015-03-13 Hua Zhou , Kenneth Lange , Marc A. Suchard
‹ Prev 1 2 3 10 Next ›