English
Related papers

Related papers: Allo: A Programming Model for Composable Accelerat…

200 papers

Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly…

Machine Learning · Computer Science 2026-04-13 Jingwei Zuo , Xinze Feng , Zien Liu , Kaijian Wang , Fanjiang Ye , Ye Cao , Zhuang Wang , Yuke Wang

Field-programmable gate arrays (FPGAs) provide an opportunity to co-design applications with hardware accelerators, yet they remain difficult to program. High-level synthesis (HLS) tools promise to raise the level of abstraction by…

Programming Languages · Computer Science 2021-11-17 Rachit Nigam , Sachille Atapattu , Samuel Thomas , Zhijing Li , Theodore Bauer , Yuwei Ye , Apurva Koti , Adrian Sampson , Zhiru Zhang

High-Level Synthesis (HLS) tools are widely adopted in FPGA-based domain-specific accelerator design. However, existing tools rely on fixed optimization strategies inherited from software compilations, limiting their effectiveness.…

Accelerating applications through the design of hardware accelerators can significantly enhance system performance and energy efficiency. Despite advances, such as high-level synthesis (HLS), designing accelerators for complex applications…

Hardware Architecture · Computer Science 2026-05-18 Abinand Nallathambi , Christopher Knight , Shantanu Ganguly , Wilfried Haensch , Anand Raghunathan

Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-12-24 Luca Piccolboni , Paolo Mantovani , Giuseppe Di Guglielmo , Luca P. Carloni

In the domain of image processing, often real-time constraints are required. In particular, in safety-critical applications, such as X-ray computed tomography in medical imaging or advanced driver assistance systems in the automotive…

Programming Languages · Computer Science 2015-02-27 Oliver Reiche , Konrad Häublein , Marc Reichenbach , Frank Hannig , Jürgen Teich , Dietmar Fey

The analysis of high-dimensional sparse data is becoming increasingly popular in many important domains. However, real-world sparse tensors are challenging to process due to their irregular shapes and data distributions. We propose the…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-04-28 Ahmed E. Helal , Jan Laukemann , Fabio Checconi , Jesmin Jahan Tithi , Teresa Ranadive , Fabrizio Petrini , Jeewhan Choi

Topology optimization (TO) is a popular and powerful computational approach for designing novel structures, materials, and devices. Two computational challenges have limited the applicability of TO to a variety of industrial applications.…

Computational Engineering, Finance, and Science · Computer Science 2020-12-01 Sirui Bi , Jiaxin Zhang , Guannan Zhang

The rapid adoption of Large Language Models (LLMs) has driven a growing demand for efficient inference, particularly in latency-sensitive applications such as chatbots and personalized assistants. Unlike traditional deep neural networks,…

Hardware Architecture · Computer Science 2025-10-06 Shubham Negi , Kaushik Roy

Artificial Intelligence (AI) has witnessed remarkable growth, particularly through the proliferation of Deep Neural Networks (DNNs). These powerful models drive technological advancements across various domains. However, to harness their…

Modern SoC-FPGA that consists of FPGA with embedded ARM cores is being popularized as an embedded vision system platform. However, the design approach of SoC-FPGA applications still follows traditional hardware-software separate workflow,…

Other Computer Science · Computer Science 2015-09-02 Shaodong Qin , Mladen Berekovic

Large language models (LLMs) exhibit memory-intensive behavior during decoding, making it a key bottleneck in LLM inference. To accelerate decoding execution, hybrid-bonding-based 3D-DRAM has been adopted in LLM accelerators. While this…

Hardware Architecture · Computer Science 2026-04-10 Cong Li , Chenhao Xue , Yi Ren , Xiping Dong , Yu Cheng , Yinbo Hu , Fujun Bai , Yixin Guo , Xiping Jiang , Qiang Wu , Zhi Yang , Zhe Cheng , Yuan Xie , Guangyu Sun

High-level synthesis (HLS) is a design flow that leverages modern language features and flexibility, such as complex data structures, inheritance, templates, etc., to prototype hardware designs rapidly. However, exploring various design…

Hardware Architecture · Computer Science 2024-03-19 Md Rubel Ahmed , Toshiaki Koike-Akino , Kieran Parsons , Ye Wang

FPGAs are well-suited for dataflow architectures that process data in a streaming or pipelined manner, thus satisfying the high computational and communication demands of emerging applications. However, manually implementing an efficient…

Hardware Architecture · Computer Science 2026-04-15 Weichuang Zhang , Yiquan Wang , Xinzhou Zhang , Chi Zhang , Yu Feng , Xiaofeng Hou , Chao Li , Jieru Zhao , Minyi Guo

High-level synthesis (HLS) allows hardware designers to create hardware designs with high-level programming languages like C/C++/OpenCL, which greatly improves hardware design productivity. However, existing HLS flows require programmers'…

Hardware Architecture · Computer Science 2024-10-11 Haocheng Xu , Haotian Hu , Sitao Huang

CPU-FPGA heterogeneous architectures are attracting ever-increasing attention in an attempt to advance computational capabilities and energy efficiency in today's datacenters. These architectures provide programmers with the ability to…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-24 Jason Cong , Peng Wei , Cody Hao Yu , Peng Zhang

Quantization is critical for efficiently deploying large language models (LLMs). Yet conventional methods remain hardware-agnostic, limited to bit-width constraints, and do not account for intrinsic circuit characteristics such as the…

Hardware Architecture · Computer Science 2025-11-18 Rohan Juneja , Shivam Aggarwal , Safeen Huda , Tulika Mitra , Li-Shiuan Peh

The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with deeply hierarchical scratchpad memories (SPMs) and…

Hardware Architecture · Computer Science 2025-12-09 Zhongchun Zhou , Chengtao Lai , Yuhang Gu , Wei Zhang

Customized processors are attractive solutions for vast domain-specific applications due to their high energy efficiency. However, designing a processor in traditional flows is time-consuming and expensive. To address this, researchers have…

Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers. Even with the help of high-level synthesis…

Hardware Architecture · Computer Science 2021-09-01 Atefeh Sohrabizadeh , Cody Hao Yu , Min Gao , Jason Cong
‹ Prev 1 2 3 10 Next ›