English
Related papers

Related papers: A Full-Stack Performance Evaluation Infrastructure…

200 papers

Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. As LLMs scale to hundreds of billions or trillions of parameters, distributed inference across…

The rapid growth of large-language models (LLMs) is driving a new wave of specialized hardware for inference. This paper presents the first workload-centric, cross-architectural performance study of commercial AI accelerators, spanning…

Hardware Architecture · Computer Science 2025-06-10 Amit Sharma

Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present…

Machine Learning · Computer Science 2024-06-18 Yingbing Huang , Lily Jiaxin Wan , Hanchen Ye , Manvi Jha , Jinghua Wang , Yuhong Li , Xiaofan Zhang , Deming Chen

Analyzing non-compilable C/C++ submodules without a resolved build environment remains a critical bottleneck for industrial software evolution. Traditional static analysis tools often fail in these scenarios due to their reliance on…

Software Engineering · Computer Science 2026-02-20 Jaid Monwar Chowdhury , Ahmad Farhan Shahriar Chowdhury , Humayra Binte Monwar , Mahmuda Naznin

While Large Language Models (LLMs) have shown remarkable advancements in reasoning and tool use, they often fail to generate optimal, grounded solutions under complex constraints. Real-world travel planning exemplifies these challenges,…

Artificial Intelligence · Computer Science 2025-10-01 Jihye Choi , Jinsung Yoon , Jiefeng Chen , Somesh Jha , Tomas Pfister

The success of large language models LLMs amplifies the need for highthroughput energyefficient inference at scale. 3DDRAMbased accelerators provide high memory bandwidth and therefore an opportunity to accelerate the bandwidthbound decode…

Systems and Control · Electrical Eng. & Systems 2025-12-10 Qipan Wang , Zhe Zhang , Shuangchen Li , Hongzhong Zheng , Zheng Liang , Yibo Lin , Runsheng Wang , Ru Huang

RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operator-level Chakra execution traces from an…

Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by…

Hardware Architecture · Computer Science 2025-01-17 Pratyush Dhingra , Janardhan Rao Doppa , Partha Pratim Pande

Accelerating applications through the design of hardware accelerators can significantly enhance system performance and energy efficiency. Despite advances, such as high-level synthesis (HLS), designing accelerators for complex applications…

Hardware Architecture · Computer Science 2026-05-18 Abinand Nallathambi , Christopher Knight , Shantanu Ganguly , Wilfried Haensch , Anand Raghunathan

The ATLAS experiment at the Large Hadron Collider has a broad physics programme ranging from precision measurements to direct searches for new particles and new interactions, requiring ever larger and ever more accurate datasets of…

High Energy Physics - Experiment · Physics 2024-11-25 ATLAS Collaboration

The past year has witnessed the increasing popularity of Large Language Models (LLMs). Their unprecedented scale and associated high hardware cost have impeded their broader adoption, calling for efficient hardware designs. With the large…

Hardware Architecture · Computer Science 2023-12-07 Hengrui Zhang , August Ning , Rohan Prabhakar , David Wentzlaff

To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a promising solution. The 3D-stacked AI chip…

Hardware Architecture · Computer Science 2026-04-30 Yiqi Liu , Noelle Crawford , Michael Wang , Jilong Xue , Jian Huang

ATLAS is a constraint-guided generation framework for structured engineering artifacts whose outputs must satisfy explicit schemas, domain rules, and audit requirements. Rather than treating a large language model as a standalone generator,…

Software Engineering · Computer Science 2026-04-07 Tong Ma , Hui Lai , Hui Wang , Zhenhu Tian , Chaochao Li , Fengjie Xu , Ling Fang

Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-07 Jiahao Fang , Huizheng Wang , Qize Yang , Dehao Kong , Xu Dai , Jinyi Deng , Yang Hu , Shouyi Yin

Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without additional training. However, most…

Machine Learning · Computer Science 2026-01-07 Tuc Nguyen , Thai Le

Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered as a promising approach to address some of the challenging…

Large language models (LLMs) have demonstrated exceptional proficiency in understanding and generating human language, but efficient inference on resource-constrained embedded devices remains challenging due to large model sizes and…

Hardware Architecture · Computer Science 2025-07-15 Weihong Xu , Haein Choi , Po-kai Hsu , Shimeng Yu , Tajana Rosing

The substantial memory bandwidth and computational demands of large language models (LLMs) present critical challenges for efficient inference. To tackle this, the literature has explored heterogeneous systems that combine neural processing…

Hardware Architecture · Computer Science 2026-05-05 Yuzong Chen , Chao Fang , Xilai Dai , Yuheng Wu , Thierry Tambe , Marian Verhelst , Mohamed S. Abdelfattah

Recently, large language models (LLMs) have achieved huge success in the natural language processing (NLP) field, driving a growing demand to extend their deployment from the cloud to edge devices. However, deploying LLMs on…

Hardware Architecture · Computer Science 2025-05-08 Yanbiao Liang , Huihong Shi , Haikuo Shao , Zhongfeng Wang

Long-context language models now advertise context windows up to millions of tokens, yet evaluations typically report a single length or a narrow task family, masking two failure modes: performance can collapse as length grows, and strong…

‹ Prev 1 2 3 10 Next ›