English
Related papers

Related papers: Revec: Program Rejuvenation through Revectorizatio…

200 papers

A current trend in HPC systems is the utilization of architectures with SIMD or vector extensions to exploit data parallelism. There are several ways to take advantage of such modern vector architectures, each with a different impact on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-05 Marc Blancafort , Roger Ferrer , Guillaume Houzeaux , Marta Garcia-Gasulla , Filippo Mantovani

Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-01-30 Johannes Hofmann , Jan Treibig , Georg Hager , Gerhard Wellein

For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Pablo Vizcaino , Georgios Ieronymakis , Nikolaos Dimou , Vassilis Papaefstathiou , Jesus Labarta , Filippo Mantovani

Leveraging the SIMD capability of modern CPU architectures is mandatory to take full benefit of their increasing performance. To exploit this feature, binary executables must be explicitly vectorized by the developers or an automatic…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-03 Hayfa Tayeb , Ludovic Paillat , Berenger Bramas

RISC-V CPUs leverage the RVV (RISC-V Vector) extension to accelerate data-parallel workloads. In addition to arithmetic operations, RVV includes powerful permutation instructions that enable flexible element rearrangement within vector…

Hardware Architecture · Computer Science 2025-06-02 Vasileios Titopoulos , George Alexakis , Chrysostomos Nicopoulos , Giorgos Dimitrakopoulos

This paper presents a novel, non-standard set of vector instruction types for exploring custom SIMD instructions in a softcore. The new types allow simultaneous access to a relatively high number of operands, reducing the instruction count…

Hardware Architecture · Computer Science 2021-06-15 Philippos Papaphilippou , Paul H. J. Kelly , Wayne Luk

SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel…

High Energy Physics - Lattice · Physics 2013-11-05 Hwancheol Jeong , Sunghoon Kim , Weonjong Lee , Seok-Ho Myung

Neural networks are increasingly used in real-time systems, such as automated driving applications. This requires high-performance hardware with predictable timing behavior. State-of-the-art real-time hardware is limited in memory and…

Hardware Architecture · Computer Science 2024-10-15 Maximilian Kirschner , Konstantin Dudzik , Jürgen Becker

Vectorization via Single Instruction, Multiple Data (SIMD) architectures is a cornerstone of high-performance computing. To fully exploit hardware potential, developers often resort to explicit vectorization using intrinsics, as…

Computation and Language · Computer Science 2026-05-19 Shangzhan Li , Xinyu Yin , Xuanyu Jin , Ye He , Yuxin Zhou , Yuxuan Li , Xu Han , Wanxiang Che , Qi Shi , Ting Liu , Maosong Sun

Intrinsic functions are specialized functions provided by the compiler that efficiently operate on architecture-specific hardware, allowing programmers to write optimized code in a high-level language that fully exploits hardware features.…

Software Engineering · Computer Science 2025-11-25 Liutong Han , Chu Kang , Mingjie Xing , Yanjun Wu

Modern processors increasingly rely on SIMD instruction sets, such as AVX and RVV, to significantly enhance parallelism and computational performance. However, production-ready compilers like LLVM and GCC often fail to fully exploit…

Programming Languages · Computer Science 2025-10-07 Shihan Fang , Wenxin Zheng

Auto-vectorization is a fundamental optimization for modern compilers to exploit SIMD parallelism. However, state-of-the-art approaches still struggle to handle intricate code patterns, often requiring manual hints or domain-specific…

Software Engineering · Computer Science 2025-06-05 Zhongchun Zheng , Kan Wu , Long Cheng , Lu Li , Rodrigo C. O. Rocha , Tianyi Liu , Wei Wei , Jianjiang Zeng , Xianwei Zhang , Yaoqing Gao

Spatial dataflow architectures such as reconfigurable dataflow accelerators (RDA) can provide much higher performance and efficiency than CPUs and GPUs. In particular, vectorized reconfigurable dataflow accelerators (vRDA) in recent…

Hardware Architecture · Computer Science 2024-02-01 Alexander Rucker , Shiv Sundram , Coleman Smith , Matthew Vilim , Raghu Prabhakar , Fredrik Kjolstad , Kunle Olukotun

The way developers implement their algorithms and how these implementations behave on modern CPUs are governed by the design and organization of these. The vectorization units (SIMD) are among the few CPUs' parts that can and must be…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-22 Bérenger Bramas

Real-time systems, particularly those used in domains like automated driving, are increasingly adopting neural networks. From this trend arises the need for high-performance hardware exhibiting predictable timing behavior. While…

Hardware Architecture · Computer Science 2026-02-26 Maximilian Kirschner , Konstantin Dudzik , Ben Krusekamp , Jürgen Becker

RISC-V provides a flexible and scalable platform for applications ranging from embedded devices to high-performance computing clusters. Particularly, its RISC-V Vector Extension (RVV) becomes of interest for the acceleration of AI…

Machine Learning · Computer Science 2025-08-20 Federico Nicolas Peccia , Frederik Haxel , Oliver Bringmann

The RISC-V "V" extension introduces vector processing to the RISC-V architecture. Unlike most SIMD extensions, it supports long vectors which can result in significant improvement of multiple applications. In this paper, we present our…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-10 Sonia Rani Gupta , Nikela Papadopoulou , Miquel Pericàs

The development of an open and free RISC-V architecture is of great interest for a wide range of areas, including high-performance computing and numerical simulation in mathematics, physics, chemistry and other problem domains. In this…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-05-21 V. D. Volokitin , E. P. Vasiliev , E. A. Kozinov , V. D. Kustikova , A. V. Liniov , Y. A. Rodimkov , A. V. Sysoyev , I. B. Meyerov

Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-21 Joseph K. L. Lee , Maurice Jamieson , Nick Brown

Handling vast amounts of data is crucial in today's world. The growth of high-performance computing has created a need for parallelization, particularly in the area of machine learning algorithms such as ANN (Approximate Nearest Neighbors).…

Machine Learning · Computer Science 2024-07-19 Konstantin Rumyantsev , Pavel Yakovlev , Andrey Gorshkov , Andrey P. Sokolov
‹ Prev 1 2 3 10 Next ›