Related papers: A Full-Stack Performance Evaluation Infrastructure…

DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

Advances in hybrid bonding and packaging have driven growing interest in 3D DRAM-stacked accelerators with higher memory bandwidth and capacity. As LLMs scale to hundreds of billions or trillions of parameters, distributed inference across…

Hardware Architecture · Computer Science 2026-04-10 Zhiwen Mo , Guoyu Li , Hao Mark Chen , Yu Cheng , Zhengju Tang , Qianzhou Wang , Lei Wang , Shuang Liang , Lingxiao Ma , Xianqi Zhou , Yuxiao Guo , Wayne Luk , Jilong Xue , Hongxiang Fan

AI Accelerators for Large Language Model Inference: Architecture Analysis and Scaling Strategies

The rapid growth of large-language models (LLMs) is driving a new wave of specialized hardware for inference. This paper presents the first workload-centric, cross-architectural performance study of commercial AI accelerators, spanning…

Hardware Architecture · Computer Science 2025-06-10 Amit Sharma

New Solutions on LLM Acceleration, Optimization, and Application

Large Language Models (LLMs) have become extremely potent instruments with exceptional capacities for comprehending and producing human-like text in a wide range of applications. However, the increasing size and complexity of LLMs present…

Machine Learning · Computer Science 2024-06-18 Yingbing Huang , Lily Jiaxin Wan , Hanchen Ye , Manvi Jha , Jinghua Wang , Yuhong Li , Xiaofan Zhang , Deming Chen

ATLAS: Automated Tree-based Language Analysis System for C and C++ source programs

Analyzing non-compilable C/C++ submodules without a resolved build environment remains a critical bottleneck for industrial software evolution. Traditional static analysis tools often fail in these scenarios due to their reliance on…

Software Engineering · Computer Science 2026-02-20 Jaid Monwar Chowdhury , Ahmad Farhan Shahriar Chowdhury , Humayra Binte Monwar , Mahmuda Naznin

ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning

While Large Language Models (LLMs) have shown remarkable advancements in reasoning and tool use, they often fail to generate optimal, grounded solutions under complex constraints. Real-world travel planning exemplifies these challenges,…

Artificial Intelligence · Computer Science 2025-10-01 Jihye Choi , Jinsung Yoon , Jiefeng Chen , Somesh Jha , Tomas Pfister

LaMoSys3.5D: Enabling 3.5D-IC-Based Large Language Model Inference Serving Systems via Hardware/Software Co-Design

The success of large language models LLMs amplifies the need for highthroughput energyefficient inference at scale. 3DDRAMbased accelerators provide high memory bandwidth and therefore an opportunity to accelerate the bandwidthbound decode…

Systems and Control · Electrical Eng. & Systems 2025-12-10 Qipan Wang , Zhe Zhang , Shuangchen Li , Hongzhong Zheng , Zheng Liang , Yibo Lin , Runsheng Wang , Ru Huang

RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference

RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operator-level Chakra execution traces from an…

Performance · Computer Science 2025-12-23 George Karfakis , Faraz Tahmasebi , Binglu Chen , Lime Yao , Saptarshi Mitra , Tianyue Pan , Hyoukjun Kwon , Puneet Gupta

Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures

Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by…

Hardware Architecture · Computer Science 2025-01-17 Pratyush Dhingra , Janardhan Rao Doppa , Partha Pratim Pande

A3D: Agentic AI flow for autonomous Accelerator Design

Accelerating applications through the design of hardware accelerators can significantly enhance system performance and energy efficiency. Despite advances, such as high-level synthesis (HLS), designing accelerators for complex applications…

Hardware Architecture · Computer Science 2026-05-18 Abinand Nallathambi , Christopher Knight , Shantanu Ganguly , Wilfried Haensch , Anand Raghunathan

AtlFast3: the next generation of fast simulation in ATLAS

The ATLAS experiment at the Large Hadron Collider has a broad physics programme ranging from precision measurements to direct searches for new particles and new interactions, requiring ever larger and ever more accurate datasets of…

High Energy Physics - Experiment · Physics 2024-11-25 ATLAS Collaboration

A Hardware Evaluation Framework for Large Language Model Inference

The past year has witnessed the increasing popularity of Large Language Models (LLMs). Their unprecedented scale and associated high hardware cost have impeded their broader adoption, calling for efficient hardware designs. With the large…

Hardware Architecture · Computer Science 2023-12-07 Hengrui Zhang , August Ning , Rohan Prabhakar , David Wentzlaff

Exploring the Efficiency of 3D-Stacked AI Chip Architecture for LLM Inference with Voxel

To overcome the well-known memory bottleneck of AI chips, 3D stacked architectures that employ advanced packaging technology with high-density through-silicon vias (TSVs) pins have proven to be a promising solution. The 3D-stacked AI chip…

Hardware Architecture · Computer Science 2026-04-30 Yiqi Liu , Noelle Crawford , Michael Wang , Jilong Xue , Jian Huang

ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE

ATLAS is a constraint-guided generation framework for structured engineering artifacts whose outputs must satisfy explicit schemas, domain rules, and audit requirements. Rather than treating a large language model as a standalone generator,…

Software Engineering · Computer Science 2026-04-07 Tong Ma , Hui Lai , Hui Wang , Zhenhu Tian , Chaochao Li , Fengjie Xu , Ling Fang

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-07 Jiahao Fang , Huizheng Wang , Qize Yang , Dehao Kong , Xu Dai , Jinyi Deng , Yang Hu , Shouyi Yin

ATLAS: Adaptive Test-Time Latent Steering with External Verifiers for Enhancing LLMs Reasoning

Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without additional training. However, most…

Machine Learning · Computer Science 2026-01-07 Tuc Nguyen , Thai Le

A Comprehensive Performance Study of Large Language Models on Novel AI Accelerators

Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered as a promising approach to address some of the challenging…

Performance · Computer Science 2023-10-10 Murali Emani , Sam Foreman , Varuni Sastry , Zhen Xie , Siddhisanket Raskar , William Arnold , Rajeev Thakur , Venkatram Vishwanath , Michael E. Papka

SLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive Thresholding

Large language models (LLMs) have demonstrated exceptional proficiency in understanding and generating human language, but efficient inference on resource-constrained embedded devices remains challenging due to large model sizes and…

Hardware Architecture · Computer Science 2025-07-15 Weihong Xu , Haein Choi , Po-kai Hsu , Shimeng Yu , Tajana Rosing

P3-LLM: An Integrated NPU-PIM Accelerator for Edge LLM Inference Using Hybrid Numerical Formats

The substantial memory bandwidth and computational demands of large language models (LLMs) present critical challenges for efficient inference. To tackle this, the literature has explored heterogeneous systems that combine neural processing…

Hardware Architecture · Computer Science 2026-05-05 Yuzong Chen , Chao Fang , Xilai Dai , Yuheng Wu , Thierry Tambe , Marian Verhelst , Mohamed S. Abdelfattah

AccLLM: Accelerating Long-Context LLM Inference Via Algorithm-Hardware Co-Design

Recently, large language models (LLMs) have achieved huge success in the natural language processing (NLP) field, driving a growing demand to extend their deployment from the cloud to edge devices. However, deploying LLMs on…

Hardware Architecture · Computer Science 2025-05-08 Yanbiao Liang , Huihong Shi , Haikuo Shao , Zhongfeng Wang

ATLAS: All-round Testing of Long-context Abilities across Scales

Long-context language models now advertise context windows up to millions of tokens, yet evaluations typically report a single length or a narrow task family, masking two failure modes: performance can collapse as length grows, and strong…

Computation and Language · Computer Science 2026-05-28 Deli Huang , Cunguang Wang , Hongyin Tang , Zhe Tang , Linsen Guo , Dongyu Ru , Ruoshi Yuan , Ziyue Zhu , Xiaoyu Li , Ziwen Wang , Chen Zhang , Anchun Gui , Wen Zan , Jiaqi Zhang , Xuezhi Cao , Jingang Wang , Xunliang Cai , Yixin Cao