English
Related papers

Related papers: Multi-Dimensional Vector ISA Extension for Mobile …

200 papers

In-Memory Acceleration (IMA) promises major efficiency improvements in deep neural network (DNN) inference, but challenges remain in the integration of IMA within a digital system. We propose a heterogeneous architecture coupling 8 RISC-V…

Hardware Architecture · Computer Science 2021-09-06 Gianmarco Ottavi , Geethan Karunaratne , Francesco Conti , Irem Boybat , Luca Benini , Davide Rossi

GEneral Matrix Multiplications (GEMMs) are recurrent in high-performance computing and deep learning workloads. Typically, high-end CPUs accelerate GEMM workloads with Single-Instruction Multiple Data (SIMD) or vector Instruction Set…

Hardware Architecture · Computer Science 2025-07-08 Alexandre de Limas Santana , Adrià Armejach , Francesc Martinez , Erich Focht , Marc Casas

Deployment of modern TinyML tasks on small battery-constrained IoT devices requires high computational energy efficiency. Analog In-Memory Computing (IMC) using non-volatile memory (NVM) promises major efficiency improvements in deep neural…

Hardware Architecture · Computer Science 2022-01-05 Angelo Garofalo , Gianmarco Ottavi , Francesco Conti , Geethan Karunaratne , Irem Boybat , Luca Benini , Davide Rossi

Recent advancements in quantization and mixed-precision approaches offers substantial opportunities to improve the speed and energy efficiency of Neural Networks (NN). Research has shown that individual parameters with varying low…

Hardware Architecture · Computer Science 2024-08-14 Giorgos Armeniakos , Alexis Maras , Sotirios Xydis , Dimitrios Soudris

This paper presents a novel, non-standard set of vector instruction types for exploring custom SIMD instructions in a softcore. The new types allow simultaneous access to a relatively high number of operands, reducing the instruction count…

Hardware Architecture · Computer Science 2021-06-15 Philippos Papaphilippou , Paul H. J. Kelly , Wayne Luk

For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction.…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-14 Pablo Vizcaino , Georgios Ieronymakis , Nikolaos Dimou , Vassilis Papaefstathiou , Jesus Labarta , Filippo Mantovani

The evolution of quantization and mixed-precision techniques has unlocked new possibilities for enhancing the speed and energy efficiency of NNs. Several recent studies indicate that adapting precision levels across different parameters can…

Machine Learning · Computer Science 2025-09-19 Giorgos Armeniakos , Alexis Maras , Sotirios Xydis , Dimitrios Soudris

Optimization of applications for supercomputers of the highest performance class requires parallelization at multiple levels using different techniques. In this contribution we focus on parallelization of particle physics simulations…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-01-23 Nils Meyer , Peter Georg , Dirk Pleiter , Stefan Solbrig , Tilo Wettig

Data movement is one of the main challenges of contemporary system architectures. Near-Data Processing (NDP) mitigates this issue by moving computation closer to the memory, avoiding excessive data movement. Our proposal, Vector-In-Memory…

Hardware Architecture · Computer Science 2022-03-29 Marco Antonio Zanata Alves , Sairo Santos , Aline S. Cordeiro , Francis B. Moreira , Paulo C. Santos , Luigi Carro

This article describes the ARM Scalable Vector Extension (SVE). Several goals guided the design of the architecture. First was the need to extend the vector processing capability associated with the ARM AArch64 execution state to better…

Hardware/Software (HW/SW) co-designed processors provide a promising solution to the power and complexity problems of the modern microprocessors by keeping their hardware simple. Moreover, they employ several runtime optimizations to…

Hardware Architecture · Computer Science 2021-03-01 Rakesh Kumar , Alejandro Martinez , Antonio Gonzalez

In memory computing (IMC) architectures for deep learning (DL) accelerators leverage energy-efficient and highly parallel matrix vector multiplication (MVM) operations, implemented directly in memory arrays. Such IMC designs have been…

Emerging Technologies · Computer Science 2024-08-14 Arkapravo Ghosh , Hemkar Reddy Sadana , Mukut Debnath , Panthadip Maji , Shubham Negi , Sumeet Gupta , Mrigank Sharad , Kaushik Roy

Modern central processing units (CPUs) feature single-instruction, multiple-data pipelines to accelerate compute-intensive floating-point and fixed-point workloads. Traditionally, these pipelines and corresponding instruction set…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-30 Stefan Remke , Alexander Breuer

Autoregressive sequence modeling stands as the cornerstone of modern Generative AI, powering results across diverse modalities ranging from text generation to image generation. However, a fundamental limitation of this paradigm is the rigid…

Machine Learning · Computer Science 2026-02-02 Yangyan Li

Edge AI deployment faces critical challenges balancing computational performance, energy efficiency, and resource constraints. This paper presents FPGA-accelerated RISC-V instruction set architecture (ISA) extensions for efficient neural…

Hardware Architecture · Computer Science 2025-11-11 Arya Parameshwara , Santosh Hanamappa Mokashi

Compared to the first generation of deep neural networks, dominated by regular, compute-intensive kernels such as matrix multiplications (MatMuls) and convolutions, modern decoder-based transformers interleave attention, normalization, and…

Hardware Architecture · Computer Science 2026-03-06 Max Wipfli , Gamze İslamoğlu , Navaneeth Kunhi Purayil , Angelo Garofalo , Luca Benini

Computation intensive kernels, such as convolutions, matrix multiplication and Fourier transform, are fundamental to edge-computing AI, signal processing and cryptographic applications. Interleaved-Multi-Threading (IMT) processor cores are…

Hardware Architecture · Computer Science 2021-02-09 Abdallah Cheikh , Stefano Sordillo , Antonio Mastrandrea , Francesco Menichelli , Giuseppe Scotti , Mauro Olivieri

SSE (streaming SIMD extensions) and AVX (advanced vector extensions) are SIMD (single instruction multiple data streams) instruction sets supported by recent CPUs manufactured in Intel and AMD. This SIMD programming allows parallel…

High Energy Physics - Lattice · Physics 2013-11-05 Hwancheol Jeong , Sunghoon Kim , Weonjong Lee , Seok-Ho Myung

Large language models (LLMs) have emerged as a powerful foundation for intelligent reasoning and decision-making, demonstrating substantial impact across a wide range of domains and applications. However, their massive parameter scales and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-29 Mingyu Sun , Xiao Zhang , Shen Qu , Yan Li , Mengbai Xiao , Yuan Yuan , Dongxiao Yu

Expanding Deep Learning applications toward edge computing demands architectures capable of delivering high computational performance and efficiency while adhering to tight power and memory constraints. Digital In-Memory Computing (DIMC)…

Hardware Architecture · Computer Science 2026-02-03 Tommaso Spagnolo , Cristina Silvano , Riccardo Massa , Filippo Grillotti , Thomas Boesch , Giuseppe Desoli
‹ Prev 1 2 3 10 Next ›