English
Related papers

Related papers: Optimized Spatial Architecture Mapping Flow for Tr…

200 papers

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza

Graph Neural Networks (GNNs) have garnered a lot of recent interest because of their success in learning representations from graph-structured data across several critical applications in cloud and HPC. Owing to their unique compute and…

A low-latency and energy-efficient tensor algebra accelerator design must optimize how data movement and operations are scheduled (i.e., mapped) in the accelerator architecture. A key mapping optimization is fusion, meaning holding data…

Hardware Architecture · Computer Science 2026-05-05 Tanner Andrulis , Michael Gilbert , Vivienne Sze , Joel S. Emer

The current state of the art of Simultaneous Localisation and Mapping, or SLAM, on low power embedded systems is about sparse localisation and mapping with low resolution results in the name of efficiency. Meanwhile, research in this field…

Robotics · Computer Science 2019-02-14 Konstantinos Boikos , Christos-Savvas Bouganis

As the landscape of deep neural networks evolves, heterogeneous dataflow accelerators, in the form of multi-core architectures or chiplet-based designs, promise more flexibility and higher inference performance through scalability. So far,…

Hardware Architecture · Computer Science 2025-10-08 Arne Symons , Linyan Mei , Steven Colleman , Pouya Houshmand , Sebastian Karl , Marian Verhelst

Efficiently supporting long context length is crucial for Transformer models. The quadratic complexity of the self-attention computation plagues traditional Transformers. Sliding window-based static sparse attention mitigates the problem by…

Hardware Architecture · Computer Science 2024-05-28 Zhenyu Bai , Pranav Dangi , Huize Li , Tulika Mitra

Recent advancements in large language models (LLMs) boasting billions of parameters have generated a significant demand for efficient deployment in inference workloads. The majority of existing approaches rely on temporal architectures that…

Machine Learning · Computer Science 2024-04-09 Hongzheng Chen , Jiahao Zhang , Yixiao Du , Shaojie Xiang , Zichao Yue , Niansong Zhang , Yaohui Cai , Zhiru Zhang

Modern machine learning accelerators are designed to efficiently execute deep neural networks (DNNs) by optimizing data movement, memory hierarchy, and compute throughput. However, emerging DNN models such as large language models, state…

Hardware Architecture · Computer Science 2025-09-03 Shubham Negi , Manik Singhal , Aayush Ankit , Sudeep Bhoja , Kaushik Roy

There is a growing interest in custom spatial accelerators for machine learning applications. These accelerators employ a spatial array of processing elements (PEs) interacting via custom buffer hierarchies and networks-on-chip. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-06-22 Gordon E. Moon , Hyoukjun Kwon , Geonhwa Jeong , Prasanth Chatarasi , Sivasankaran Rajamanickam , Tushar Krishna

In recent years, there has been tremendous advances in hardware acceleration of deep neural networks. However, most of the research has focused on optimizing accelerator microarchitecture for higher performance and energy efficiency on a…

Machine Learning · Computer Science 2019-12-12 Sam Likun Xi , Yuan Yao , Kshitij Bhardwaj , Paul Whatmough , Gu-Yeon Wei , David Brooks

The growing demand for sparse tensor algebra (SpTA) in machine learning and big data has driven the development of various sparse tensor accelerators. However, most existing manually designed accelerators are limited to specific scenarios,…

Machine Learning · Computer Science 2025-08-19 Boran Zhao , Haiming Zhai , Zihang Yuan , Hetian Liu , Tian Xia , Wenzhe Zhao , Pengju Ren

State Space Models (SSMs) offer a promising alternative to transformers for long-sequence processing. However, their efficiency remains hindered by memory-bound operations, particularly in the prefill stage. While MARCA, a recent first…

Hardware Architecture · Computer Science 2026-04-10 Robin Geens , Arne Symons , Marian Verhelst

The energy and latency of an accelerator running a deep neural network (DNN) depend on how the computation and data movement are scheduled in the accelerator (i.e., mapping), and picking an optimal mapping is essential to achieve…

Hardware Architecture · Computer Science 2026-05-05 Michael Gilbert , Tanner Andrulis , Vivienne Sze , Joel S. Emer

Intelligent interaction with the real world requires robotic agents to jointly reason over high-level plans and low-level controls. Task and motion planning (TAMP) addresses this by combining symbolic planning and continuous trajectory…

Robotics · Computer Science 2025-09-18 Denis Shcherba , Eckart Cobo-Briesewitz , Cornelius V. Braun , Marc Toussaint

In this paper, we demonstrate the design of efficient and high-performance AI/Deep Learning accelerators with customized STT-MRAM and a reconfigurable core. Based on model-driven detailed design space exploration, we present the design…

Hardware Architecture · Computer Science 2021-04-07 Kaniz Mishty , Mehdi Sadi

Recent deep learning workloads increasingly push computational demand beyond what current memory systems can sustain, with many kernels stalling on data movement rather than computation. While modern dataflow accelerators incorporate…

Programming Languages · Computer Science 2025-09-09 Shihan Fang , Hongzheng Chen , Niansong Zhang , Jiajie Li , Han Meng , Adrian Liu , Zhiru Zhang

Transformer-based models have achieved stateof-the-art results in many tasks in natural language processing. However, such models are usually slow at inference time, making deployment difficult. In this paper, we develop an efficient…

Machine Learning · Computer Science 2020-08-18 Henry Tsai , Jayden Ooi , Chun-Sung Ferng , Hyung Won Chung , Jason Riesa

Transformers have revolutionized AI in natural language processing and computer vision, but their large computation and memory demands pose major challenges for hardware acceleration. In practice, end-to-end throughput is often limited by…

Hardware Architecture · Computer Science 2026-03-20 Qunyou Liu , Marina Zapater , David Atienza

Significant effort has been placed on the development of toolflows that map Convolutional Neural Network (CNN) models to Field Programmable Gate Arrays (FPGAs) with the aim of automating the production of high performing designs for a…

Hardware Architecture · Computer Science 2022-08-10 Alexander Montgomerie-Corcoran , Zhewen Yu , Christos-Savvas Bouganis

Large Language Models (LLMs) impose massive computational demands, driving the need for scalable multi-chiplet accelerators. However, existing mapping space exploration efforts for such accelerators primarily focus on traditional…

Hardware Architecture · Computer Science 2026-04-02 Boyu Li , Zongwei Zhu , Yi Xiong , Qianyue Cao , Jiawei Geng , Xiaonan Zhang , Xi Li
‹ Prev 1 2 3 10 Next ›