English
Related papers

Related papers: Autovesk: Automatic vectorized code generation fro…

200 papers

Vectorization via Single Instruction, Multiple Data (SIMD) architectures is a cornerstone of high-performance computing. To fully exploit hardware potential, developers often resort to explicit vectorization using intrinsics, as…

Computation and Language · Computer Science 2026-05-19 Shangzhan Li , Xinyu Yin , Xuanyu Jin , Ye He , Yuxin Zhou , Yuxuan Li , Xu Han , Wanxiang Che , Qi Shi , Ting Liu , Maosong Sun

We present a technique for automatically transforming kernel-based computations in disparate, nested loops into a fused, vectorized form that can reduce intermediate storage needs and lead to improved performance on contemporary hardware.…

Performance · Computer Science 2017-10-25 Jason Sewall , Simon J. Pennycook

SIMD vectorization has lately become a key challenge in high-performance computing. However, hand-written explicitly vectorized code often poses a threat to the software's sustainability. In this publication we solve this sustainability and…

Numerical Analysis · Mathematics 2018-12-20 Dominic Kempf , René Heß , Steffen Müthing , Peter Bastian

Modern processors increasingly rely on SIMD instruction sets, such as AVX and RVV, to significantly enhance parallelism and computational performance. However, production-ready compilers like LLVM and GCC often fail to fully exploit…

Programming Languages · Computer Science 2025-10-07 Shihan Fang , Wenxin Zheng

Modern microprocessors are equipped with Single Instruction Multiple Data (SIMD) or vector instructions which expose data level parallelism at a fine granularity. Programmers exploit this parallelism by using low-level vector intrinsics in…

Programming Languages · Computer Science 2019-02-11 Charith Mendis , Ajay Jain , Paras Jain , Saman Amarasinghe

A current trend in HPC systems is the utilization of architectures with SIMD or vector extensions to exploit data parallelism. There are several ways to take advantage of such modern vector architectures, each with a different impact on the…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-05 Marc Blancafort , Roger Ferrer , Guillaume Houzeaux , Marta Garcia-Gasulla , Filippo Mantovani

The way developers implement their algorithms and how these implementations behave on modern CPUs are governed by the design and organization of these. The vectorization units (SIMD) are among the few CPUs' parts that can and must be…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-22 Bérenger Bramas

Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently…

Software Engineering · Computer Science 2024-06-10 Jubi Taneja , Avery Laird , Cong Yan , Madan Musuvathi , Shuvendu K. Lahiri

Vectorization is increasingly important to achieve high performance on modern hardware with SIMD instructions. Assembly of matrices and vectors in the finite element method, which is characterized by iterating a local assembly kernel over…

Mathematical Software · Computer Science 2020-08-26 Tianjiao Sun , Lawrence Mitchell , Kaushik Kulkarni , Andreas Klöckner , David A. Ham , Paul H. J. Kelly

Automatic code optimization is a complex process that typically involves the application of multiple discrete algorithms that modify the program structure irreversibly. However, the design of these algorithms is often monolithic, and they…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Kazuaki Matsumura , Simon Garcia De Gonzalo , Antonio J. Peña

SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel…

High Energy Physics - Lattice · Physics 2013-11-05 Hwancheol Jeong , Sunghoon Kim , Weonjong Lee , Seok-Ho Myung

Scalable vector instruction sets such as Arm SVE enable vector-length-agnostic (VLA) execution, allowing a single implementation to adapt across hardware with different vector lengths. However, they complicate compiler code generation, as…

Performance · Computer Science 2026-05-19 Ege Beysel , Maximilian Bartel , Jan Moritz Joseph

Vector graphics are widely used to represent fonts, logos, digital artworks, and graphic designs. But, while a vast body of work has focused on generative algorithms for raster images, only a handful of options exists for vector graphics.…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Pradyumna Reddy , Michael Gharbi , Michal Lukac , Niloy J. Mitra

To fully exploit the performance potential of modern multi-core processors, machine learning and data mining algorithms for big data must be parallelized in multiple ways. Today's CPUs consist of multiple cores, each following an…

Machine Learning · Computer Science 2020-11-09 Christian Böhm , Claudia Plant

Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-19 Kun Li , Liang Yuan , Yunquan Zhang , Yue Yue , Hang Cao , Pengqi Lu

This article describes the ARM Scalable Vector Extension (SVE). Several goals guided the design of the architecture. First was the need to extend the vector processing capability associated with the ARM AArch64 execution state to better…

Modular arithmetic is widely used in crytography and symbolic computation. This paper presents a vectorized Montgomery algorithm for modular multiplication, the key to fast modular arithmetic, that fully utilizes the SIMD instructions. We…

Mathematical Software · Computer Science 2016-09-06 Lingchuan Meng

Vectorization is a compiler optimization that replaces multiple operations on scalar values with a single operation on vector values. Although common in traditional compilers such as rustc, clang, and gcc, vectorization is not common in the…

Modern optimizing compilers are able to exploit memory access or computation patterns to generate vectorization codes. However, such patterns in irregular applications are unknown until runtime due to the input dependence. Thus, either…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-26 Changxi Liu , Hailong Yang , Xu Liu , Zhongzhi Luan , Depei Qian

We introduce a code generator that converts unoptimized C++ code operating on sparse data into vectorized and parallel CPU or GPU kernels. Our approach unrolls the computation into a massive expression graph, performs redundant expression…

Programming Languages · Computer Science 2022-03-15 Philipp Herholz , Xuan Tang , Teseo Schneider , Shoaib Kamil , Daniele Panozzo , Olga Sorkine-Hornung
‹ Prev 1 2 3 10 Next ›