Related papers: RASA: Efficient Register-Aware Systolic Array Matr…

ONE-SA: Enabling Nonlinear Operations in Systolic Arrays for Efficient and Flexible Neural Network Inference

The computation and memory-intensive nature of DNNs limits their use in many mobile and embedded contexts. Application-specific integrated circuit (ASIC) hardware accelerators employ matrix multiplication units (such as the systolic arrays)…

Hardware Architecture · Computer Science 2024-02-02 Ruiqi Sun , Yinchen Ni , Xin He , Jie Zhao , An Zou

VUSA: Virtually Upscaled Systolic Array Architecture to Exploit Unstructured Sparsity in AI Acceleration

Leveraging high degrees of unstructured sparsity is a promising approach to enhance the efficiency of deep neural network DNN accelerators - particularly important for emerging Edge-AI applications. We introduce VUSA, a systolic-array…

Hardware Architecture · Computer Science 2025-06-03 Shereef Helal , Alberto Garcia-Ortiz , Lennart Bamberg

SARA: A Stall-Aware Memory Allocation Strategy for Mixed-Criticality Systems

The memory capacity in edge devices is often limited due to constraints on cost, size, and power. Consequently, memory competition leads to inevitable page swapping in memory-constrained mixed-criticality edge devices, causing slow storage…

Operating Systems · Computer Science 2025-11-26 Meng-Chia Lee , Wen Sheng Lim , Yuan-Hao Chang , Tei-Wei Kuo

SystolicAttention: Fusing FlashAttention within a Single Systolic Array

Transformer models rely heavily on the scaled dot-product attention (SDPA) operation, typically implemented as FlashAttention. Characterized by its frequent interleaving of matrix multiplications and softmax operations, FlashAttention fails…

Hardware Architecture · Computer Science 2025-12-09 Jiawei Lin , Yuanlong Li , Guokai Chen , Thomas Bourgeat

Search for Optimal Systolic Arrays: A Comprehensive Automated Exploration Framework and Lessons Learned

Systolic arrays have been widely used for accelerating HPC and deep learning applications. There is a plethora of previous works on the performance tuning of systolic arrays, but usually based on a number of oversimplified assumptions…

Hardware Architecture · Computer Science 2021-11-30 Jie Wang , Jason Cong

SISA: A Scale-In Systolic Array for GEMM Acceleration

The currently dominant AI/ML workloads, such as Large Language Models (LLMs), rely on the efficient execution of General Matrix-Matrix Multiplication (GEMM) operations. Thus, most systems are equipped with dedicated matrix hardware…

Hardware Architecture · Computer Science 2026-04-01 Luigi Altamura , Alessio Cicero , Mateo Vázquez Maceiras , Mohammad Ali Maleki , Pedro Trancoso

Systolic Array Data Flows for Efficient Matrix Multiplication in Deep Neural Networks

The paper discusses how Systolic Arrays can improve matrix multiplication for deep neural networks (DNNs). With AI models like OpenAI's GPT now containing trillions of parameters, the need for efficient matrix multiplication is more…

Hardware Architecture · Computer Science 2024-10-31 Tejas Raja

ASA -- The Adaptive Scheduling Algorithm

In High Performance Computing (HPC) infrastructures, the control of resources by batch systems can lead to prolonged queue waiting times and adverse effects on the overall execution times of applications, particularly in data-intensive and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-19 Abel Souza , Kristiaan Pelckmans , Devarshi Ghoshal , Lavanya Ramakrishnan , Johan Tordsson

Strassen Multisystolic Array Hardware Architectures

While Strassen's matrix multiplication algorithm reduces the complexity of naive matrix multiplication, general-purpose hardware is not suitable for achieving the algorithm's promised theoretical speedups. This leaves the question of if it…

Hardware Architecture · Computer Science 2025-02-17 Trevor E. Pogue , Nicola Nicolici

RDMAvisor: Toward Deploying Scalable and Simple RDMA as a Service in Datacenters

RDMA is increasingly adopted by cloud computing platforms to provide low CPU overhead, low latency, high throughput network services. On the other hand, however, it is still challenging for developers to realize fast deployment of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-02-07 Zhi Wang , Xiaoliang Wang , Zhuzhong Qian , Baoliu Ye , Sanglu Lu

Sparse Active Rectangular Array with Few Closely Spaced Elements

Sparse sensor arrays offer a cost effective alternative to uniform arrays. By utilizing the co-array, a sparse array can match the performance of a filled array, despite having significantly fewer sensors. However, even sparse arrays can…

Signal Processing · Electrical Eng. & Systems 2018-11-14 Robin Rajamäki , Visa Koivunen

Split-Aperture Phased Array Radar Resource Management for Tracking Tasks

The next generation of radar systems will include advanced digital front-end technology in the apertures allowing for spatially subdividing radar tasks over the array, the so-called split-aperture phased array (SAPA) concept. The goal of…

Signal Processing · Electrical Eng. & Systems 2025-01-31 Pepijn B. Cox , Wim L. van Rossum

Redundant Array Computation Elimination

Redundancy elimination is a key optimization direction, and loop nests are the main optimization target in modern compilers. Previous work on redundancy elimination of array computations in loop nests lacks universality. These approaches…

Performance · Computer Science 2025-06-30 Zixuan Wang , Liang Yuan , Xianmeng Jiang , Kun Li , Junmin Xiao , Yunquan Zhang

ROSA: R Optimizations with Static Analysis

R is a popular language and programming environment for data scientists. It is increasingly co-packaged with both relational and Hadoop-based data platforms and can often be the most dominant computational component in data analytics…

Programming Languages · Computer Science 2017-07-04 Rathijit Sen , Jianqiao Zhu , Jignesh M. Patel , Somesh Jha

Dynamic Adaptive Resource Scheduling for Phased Array Radar: Enhancing Efficiency through Synthesis Priorities and Pulse Interleaving

To enhance the resource scheduling performance of phased array radar, we propose a dynamic adaptive resource scheduling algorithm based on synthesis priorities and pulse interleaving. This approach addresses the challenges of low…

Signal Processing · Electrical Eng. & Systems 2024-10-01 Mingguang Han

Accelerating Algorithms using a Dataflow Graph in a Reconfigurable System

In this paper, the acceleration of algorithms using a design of a field programmable gate array (FPGA) as a prototype of a static dataflow architecture is discussed. The static dataflow architecture using operators interconnected by…

Hardware Architecture · Computer Science 2015-03-13 Jorge Luiz e Silva , Joelmir Jose Lopes , Bruno de Abreu Silva , Antonio Carlos Fernandes da Silva

Mapping and Execution of Nested Loops on Processor Arrays: CGRAs vs. TCPAs

Increasing demands for computing power also propel the need for energy-efficient SoC accelerator architectures. One class of such accelerators are so-called processor arrays, which typically integrate a two-dimensional mesh of…

Hardware Architecture · Computer Science 2025-02-18 Dominik Walter , Marita Halm , Daniel Seidel , Indrayudh Ghosh , Christian Heidorn , Frank Hannig , Jürgen Teich

Radar Resource Management for Active Tracking Using Split-Aperture Phased Arrays

Flexible front-end technology will become available in future multifunction radar systems to improve adaptability to the operational theatre. A potential concept to utilize this flexibility is to subdivide radar tasks spatially over the…

Signal Processing · Electrical Eng. & Systems 2024-02-28 Pepijn B. Cox , Wim L. van Rossum

Memory-Efficient FPGA Implementation of Stochastic Simulated Annealing

Simulated annealing (SA) is a well-known algorithm for solving combinatorial optimization problems. However, the computation time of SA increases rapidly, as the size of the problem grows. Recently, a stochastic simulated annealing (SSA)…

Hardware Architecture · Computer Science 2026-01-27 Duckgyu Shin , Naoya Onizawa , Warren J. Gross , Takahiro Hanyu

FPIA: Field-Programmable Ising Arrays with In-Memory Computing

Ising Machine is a promising computing approach for solving combinatorial optimization problems. It is naturally suited for energy-saving and compact in-memory computing implementations with emerging memories. A na\"ive in-memory computing…

Hardware Architecture · Computer Science 2024-01-30 George Higgins Hutchinson , Ethan Sifferman , Tinish Bhattacharya , Dmitri B. Strukov