Related papers: Allo: A Programming Model for Composable Accelerat…

ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads

Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly…

Machine Learning · Computer Science 2026-04-13 Jingwei Zuo , Xinze Feng , Zien Liu , Kaijian Wang , Fanjiang Ye , Ye Cao , Zhuang Wang , Yuke Wang

Predictable Accelerator Design with Time-Sensitive Affine Types

Field-programmable gate arrays (FPGAs) provide an opportunity to co-design applications with hardware accelerators, yet they remain difficult to program. High-level synthesis (HLS) tools promise to raise the level of abstraction by…

Programming Languages · Computer Science 2021-11-17 Rachit Nigam , Sachille Atapattu , Samuel Thomas , Zhijing Li , Theodore Bauer , Yuwei Ye , Apurva Koti , Adrian Sampson , Zhiru Zhang

DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning

High-Level Synthesis (HLS) tools are widely adopted in FPGA-based domain-specific accelerator design. However, existing tools rely on fixed optimization strategies inherited from software compilations, limiting their effectiveness.…

Machine Learning · Computer Science 2025-12-15 Jinming Ge , Linfeng Du , Likith Anaparty , Shangkun Li , Tingyuan Liang , Afzal Ahmad , Vivek Chaturvedi , Sharad Sinha , Zhiyao Xie , Jiang Xu , Wei Zhang

A3D: Agentic AI flow for autonomous Accelerator Design

Accelerating applications through the design of hardware accelerators can significantly enhance system performance and energy efficiency. Despite advances, such as high-level synthesis (HLS), designing accelerators for complex applications…

Hardware Architecture · Computer Science 2026-05-18 Abinand Nallathambi , Christopher Knight , Shantanu Ganguly , Wilfried Haensch , Anand Raghunathan

COSMOS: Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators

Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-24 Luca Piccolboni , Paolo Mantovani , Giuseppe Di Guglielmo , Luca P. Carloni

Automatic Optimization of Hardware Accelerators for Image Processing

In the domain of image processing, often real-time constraints are required. In particular, in safety-critical applications, such as X-ray computed tomography in medical imaging or advanced driver assistance systems in the automotive…

Programming Languages · Computer Science 2015-02-27 Oliver Reiche , Konrad Häublein , Marc Reichenbach , Frank Hannig , Jürgen Teich , Dietmar Fey

ALTO: Adaptive Linearized Storage of Sparse Tensors

The analysis of high-dimensional sparse data is becoming increasingly popular in many important domains. However, real-world sparse tensors are challenging to process due to their irregular shapes and data distributions. We propose the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-28 Ahmed E. Helal , Jan Laukemann , Fabio Checconi , Jesmin Jahan Tithi , Teresa Ranadive , Fabrizio Petrini , Jeewhan Choi

Scalable Deep-Learning-Accelerated Topology Optimization for Additively Manufactured Materials

Topology optimization (TO) is a popular and powerful computational approach for designing novel structures, materials, and devices. Two computational challenges have limited the applicability of TO to a variety of industrial applications.…

Computational Engineering, Finance, and Science · Computer Science 2020-12-01 Sirui Bi , Jiaxin Zhang , Guannan Zhang

HALO: Memory-Centric Heterogeneous Accelerator with 2.5D Integration for Low-Batch LLM Inference

The rapid adoption of Large Language Models (LLMs) has driven a growing demand for efficient inference, particularly in latency-sensitive applications such as chatbots and personalized assistants. Unlike traditional deep neural networks,…

Hardware Architecture · Computer Science 2025-10-06 Shubham Negi , Kaushik Roy

Using the Abstract Computer Architecture Description Language to Model AI Hardware Accelerators

Artificial Intelligence (AI) has witnessed remarkable growth, particularly through the proliferation of Deep Neural Networks (DNNs). These powerful models drive technological advancements across various domains. However, to harness their…

Hardware Architecture · Computer Science 2024-02-02 Mika Markus Müller , Alexander Richard Manfred Borst , Konstantin Lübeck , Alexander Louis-Ferdinand Jung , Oliver Bringmann

A Comparison of High-Level Design Tools for SoC-FPGA on Disparity Map Calculation Example

Modern SoC-FPGA that consists of FPGA with embedded ARM cores is being popularized as an embedded vision system platform. However, the design approach of SoC-FPGA applications still follows traditional hardware-software separate workflow,…

Other Computer Science · Computer Science 2015-09-02 Shaodong Qin , Mladen Berekovic

A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adopted in LLM accelerators. While this…

Hardware Architecture · Computer Science 2026-04-10 Cong Li , Chenhao Xue , Yi Ren , Xiping Dong , Yu Cheng , Yinbo Hu , Fujun Bai , Yixin Guo , Xiping Jiang , Qiang Wu , Zhi Yang , Zhe Cheng , Yuan Xie , Guangyu Sun

AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs

High-level synthesis (HLS) is a design flow that leverages modern language features and flexibility, such as complex data structures, inheritance, templates, etc., to prototype hardware designs rapidly. However, exploring various design…

Hardware Architecture · Computer Science 2024-03-19 Md Rubel Ahmed , Toshiaki Koike-Akino , Kieran Parsons , Ye Wang

CODO: An Automated Compiler for Comprehensive Dataflow Optimization

FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. However, manually implementing an efficient…

Hardware Architecture · Computer Science 2026-04-15 Weichuang Zhang , Yiquan Wang , Xinzhou Zhang , Chi Zhang , Yu Feng , Xiaofeng Hou , Chao Li , Jieru Zhao , Minyi Guo

Optimizing High-Level Synthesis Designs with Retrieval-Augmented Large Language Models

High-level synthesis (HLS) allows hardware designers to create hardware designs with high-level programming languages like C/C++/OpenCL, which greatly improves hardware design productivity. However, existing HLS flows require programmers'…

Hardware Architecture · Computer Science 2024-10-11 Haocheng Xu , Haotian Hu , Sitao Huang

AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-24 Jason Cong , Peng Wei , Cody Hao Yu , Peng Zhang

HALO: Hardware-aware quantization with low critical-path-delay weights for LLM acceleration

Quantization is critical for efficiently deploying large language models (LLMs). Yet conventional methods remain hardware-agnostic, limited to bit-width constraints, and do not account for intrinsic circuit characteristics such as the…

Hardware Architecture · Computer Science 2025-11-18 Rohan Juneja , Shivam Aggarwal , Safeen Huda , Tulika Mitra , Li-Shiuan Peh

DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…

Hardware Architecture · Computer Science 2025-12-09 Zhongchun Zhou , Chengtao Lai , Yuhang Gu , Wei Zhang

AGON: Automated Design Framework for Customizing Processors from ISA Documents

Customized processors are attractive solutions for vast domain-specific applications due to their high energy efficiency. However, designing a processor in traditional flows is time-consuming and expensive. To address this, researchers have…

Hardware Architecture · Computer Science 2025-01-22 Chongxiao Li , Di Huang , Pengwei Jin , Tianyun Ma , Husheng Han , Shuyao Cheng , Yifan Hao , Yongwei Zhao , Guanglin Xu , Zidong Du , Rui Zhang , Xiaqing Li , Yuanbo Wen , Xing Hu , Qi Guo

AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators

Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis…

Hardware Architecture · Computer Science 2021-09-01 Atefeh Sohrabizadeh , Cody Hao Yu , Min Gao , Jason Cong