硬件体系结构

CompPow: A Case for Component-level GPU Power Management

The ever increasing demand for ML-driven intelligence in a wide spectrum of domains has led to ubiquity of GPUs. At the same time, GPUs are notorious for their power consumption needs and often dominate power allocation in a typical ML…

硬件体系结构 · 计算机科学 2026-05-22 Shaizeen Aga , Mohamed Assem Ibrahim

Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

Hardware verification is one of the most challenging stages of the hardware design process, requiring significant time and resources to ensure a design is fully validated and production-ready. Verification teams aim to maximize design…

硬件体系结构 · 计算机科学 2026-05-22 Sean Lowe , Elias Hilaneh , Alma Babbit , Nakul Gopalan , Vidya Chhabria , Aman Arora

FASE: FPGA-Assisted Syscall Emulation for Rapid End-to-End Processor Performance Validation

The rapid advancement of AI workloads and domain-specific architectures has led to increasingly diverse processor microarchitectures, whose design exploration requires fast and accurate performance validation. However, traditional workflows…

硬件体系结构 · 计算机科学 2026-05-22 Chengzhen Meng , Xiuzhuang Chen , Bingcai Sui , Zhenyu Zhao , Tun Li , Hongjun Dai

Supporting Dynamic Control-Flow Execution for Runtime Reconfigurable Processors

As the need for more computing power grows, traditional methods are hitting limits. To boost performance, we're expanding Central Processing Unit (CPU) capabilities and using specialized hardware accelerators. For example, mobile devices…

硬件体系结构 · 计算机科学 2026-05-21 Hassan Nassar , Rafik Youssef , Lars Bauer , Jörg Henkel

ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing

Spiking neural networks (SNNs) exploit event-driven and addition-only computation to substantially improve efficiency for intelligent computation. A key temporal property of SNNs, elastic inference, allows outputs to emerge progressively,…

硬件体系结构 · 计算机科学 2026-05-21 Kang You , Chen Nie , Lee Jun Yan , Ziling Wei , Cheng Zou , Zekai Xu , Yu Feng , Honglan Jiang , Zhezhi He

HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

The system-level cache is a critical resource shared by processor cores and domain-specific accelerators in heterogeneous systems on chips (SoCs). The strict QoS requirements of accelerators, such as deadlines, can lead to severe…

硬件体系结构 · 计算机科学 2026-05-21 Ayushi Agarwal , Anannya Mathur , Preeti Ranjan Panda

A Hardware-Based Multi-Stage Dynamic Power Management Architecture for Autonomous Low-Light Operation

The advance of autonomous Smart Sensor Networks and embedded systems for the Internet of Things, powered by photovoltaic energy harvesting, is severely limited by energy efficiency, especially in low-light environments. While Dynamic Power…

硬件体系结构 · 计算机科学 2026-05-20 Charalampos S. Kouzinopoulos , Marcel L. Meli , Martin Schellenberg , Philip J. Poole , Mathieu Bellanger , Matthias Kauer , Julien De Vos , Dimosthenis Ioannidis , Dimitrios Tzovaras

HSCO-Bench: An Agent-Driven End-to-End Hardware-Software Co-design Benchmark for Systems-on-Chip

Large language models (LLMs) are adopted for software and hardware design, yet these domains are still evaluated separately. Software benchmarks typically assume fixed hardware targets, while hardware benchmarks focus on component-level…

硬件体系结构 · 计算机科学 2026-05-20 Pei-Huan Tsai , Kuan-Lin Chiu , William Baisi , Pin-Yu Chen , Luca P. Carloni

Building Reliable Arithmetic Multipliers Under NBTI Aging and Process Variations

Hardware aging poses a significant challenge for integrated circuits (ICs), leading to performance degradation and eventual failure. In this work, we focus on the aging of arithmetic multipliers, which are a cornerstone of modern computing…

硬件体系结构 · 计算机科学 2026-05-19 Masoud Heidary , Biresh Kumar Joardar

CPPL: A Circuit Prompt Programming Language

Large language models (LLMs) have shown promise in register-transfer level (RTL) design automation, but direct RTL generation remains difficult to validate, optimize, and integrate with compiler-based hardware design flows. Hardware…

硬件体系结构 · 计算机科学 2026-05-19 Shuo Yin , Yihe Wang , Lancheng Zou , Xufeng Yao , Tinghuan Chen , Chen Bai , Zhengrong Wang , Tsung-Yi Ho , Bei Yu

VeriCache: Turning Lossy KV Cache into Lossless LLM Inference

The large size of the KV cache has become a major bottleneck for serving LLMs with increasing context lengths. In response, many KV cache compression methods, such as token dropping and quantization, have been proposed. However, almost all…

硬件体系结构 · 计算机科学 2026-05-19 Jiayi Yao , Samuel Shen , Kuntai Du , Shaoting Feng , Dongjoo Seo , Rui Zhang , Yuyang Huang , Yuhan Liu , Shan Lu , Junchen Jiang

Workload-Aware Early-Stage Power Delivery Network Optimization via Architectural Power Traces

Power Delivery Networks (PDNs) are critical for maintaining voltage integrity in modern multiprocessor systems. Conventional early-stage PDN planning relies on static or worst-case power assumptions, often leading to over-provisioned…

硬件体系结构 · 计算机科学 2026-05-19 Oran Hayes , Maria Pantazi-Kypraiou , Athanasios Tziouvaras , George Stamoulis , Anuj Pathania , Shreejith Shanker , George Floros

VeriHGN: Heterogeneous Graph-Based Congestion Prediction for Chip Layout Verification

As Very Large Scale Integration (VLSI) designs continue to scale in size and complexity, layout verification has become a central challenge in modern Electronic Design Automation (EDA) workflows. In practice, congestion can only be…

硬件体系结构 · 计算机科学 2026-05-19 Runbang Hu , Bo Fang , Bingzhe Li , Yuede Ji

Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

FP8 low-precision formats have gained significant adoption in Transformer inference and training. However, existing digital compute-in-memory (DCIM) architectures face challenges in supporting variable FP8 aligned-mantissa bitwidths, as…

硬件体系结构 · 计算机科学 2026-05-19 Liang Zhao , Kunming Shao , Zhipeng Liao , Xijie Huang , Tim Kwang-Ting Cheng , Chi-Ying Tsui , Yi Zou

Efficient and Accurate Graph Classification with Hyperdimensional Computing on FPGA

Real-time, energy-efficient inference on edge devices is essential for graph classification across a range of applications. Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that encodes input features into…

硬件体系结构 · 计算机科学 2026-05-19 Jebacyril Arockiaraj , Dhruv Parikh , Viktor Prasanna

Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference

Research on lane change prediction has gained a lot of momentum in the last couple of years. However, most research is confined to simulation or results obtained from datasets, leaving a gap between algorithmic advances and on-road…

硬件体系结构 · 计算机科学 2026-05-19 M. Manzour , Catherine M. Elias , Omar M. Shehata , R. Izquierdo , M. A. Sotelo

A Quarter of a Century of Neuromorphic Architectures on FPGAs -- an Overview

Neuromorphic computing is a relatively new discipline of computer science, where the principles of biological brain's computation and memory are used to create a new way of processing information, based on networks of spiking neurons. Those…

硬件体系结构 · 计算机科学 2026-05-19 Wiktor J. Szczerek , Artur Podobas

TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing

Ray tracing (RT) is a 3D graphics technique that offers highly realistic visuals. It is becoming prominent and accessible as GPU vendors have integrated dedicated ray tracing acceleration hardware. However, tracing millions of rays through…

硬件体系结构 · 计算机科学 2026-05-18 Yavuz Selim Tozlu , Anshul Naithani , Huiyang Zhou

ADS-IMC: Accelerating Data Sorting with In-Memory Computation

Sorting is a fundamental operation across numerous computational domains. Traditionally, this process involves transferring data from main memory to a processing unit for sorting, followed by writing the sorted data back to memory. This…

硬件体系结构 · 计算机科学 2026-05-18 Narendra Singh Dhakad , Santosh Kumar Vishvakarma

SRAM Based Digital Custom Compute Engine for Improved Area Efficiency of AI Hardware

This paper presents a novel architecture utilizing a 10T SRAM cell for XNOR-based in-memory computing, aimed at mitigating the extensive routing challenges typically encountered in conventional in-memory computing systems. By integrating a…

硬件体系结构 · 计算机科学 2026-05-18 Narendra Singh Dhakad , Santosh Kumar Vishvakarma