Related papers: READ: Reliability-Enhanced Accelerator Dataflow Op…

READ: Recurrent Adapter with Partial Video-Language Alignment for Parameter-Efficient Transfer Learning in Low-Resource Video-Language Modeling

Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Thong Nguyen , Xiaobao Wu , Xinshuai Dong , Khoi Le , Zhiyuan Hu , Cong-Duy Nguyen , See-Kiong Ng , Luu Anh Tuan

READ: Recurrent Adaptation of Large Transformers

Fine-tuning large-scale Transformers has led to the explosion of many AI applications across Natural Language Processing and Computer Vision tasks. However, fine-tuning all pre-trained model parameters becomes impractical as the model size…

Machine Learning · Computer Science 2024-10-07 John Nguyen , Sid Wang , Ke Li , Carole-Jean Wu

DiscoRD: An Experimental Methodology for Quickly Discovering the Reliable Read Disturbance Threshold of Real DRAM Chips

State-of-the-art DRAM read disturbance mitigations rely on the read disturbance threshold (RDT) (e.g., the number of aggressor row activations needed to induce the first read disturbance bitflip) to securely and performance- and…

Hardware Architecture · Computer Science 2026-03-16 Ataberk Olgun , F. Nisa Bostanci , Ismail Emir Yuksel , Haocong Luo , Minesh Patel , A. Giray Yaglikci , Onur Mutlu

ViTAD: Timing Violation-Aware Debugging of RTL Code using Large Language Models

In modern Very Large Scale Integrated (VLSI) circuit design flow, the Register-Transfer Level (RTL) stage presents a critical opportunity for timing optimization. Addressing timing violations at this early stage is essential, as modern…

Hardware Architecture · Computer Science 2025-08-20 Wenhao Lv , Yingjie Xia , Xiyuan Chen , Li Kuang

READ: Real-time and Efficient Asynchronous Diffusion for Audio-driven Talking Head Generation

The introduction of diffusion models has brought significant advances to the field of audio-driven talking head generation. However, the extremely slow inference speed severely limits the practical implementation of diffusion-based talking…

Graphics · Computer Science 2025-11-18 Haotian Wang , Yuzhe Weng , Jun Du , Haoran Xu , Xiaoyan Wu , Shan He , Bing Yin , Cong Liu , Jianqing Gao , Qingfeng Liu

FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching

The inference of ML models composed of diverse structures, types, and sizes boils down to the execution of different dataflows (i.e. different tiling, ordering, parallelism, and shapes). Using the optimal dataflow for every layer of…

Hardware Architecture · Computer Science 2026-04-07 Jianming Tong , Anirudh Itagi , Prasanth Chatarasi , Tushar Krishna

Reducing Solid-State Drive Read Latency by Optimizing Read-Retry

3D NAND flash memory with advanced multi-level cell techniques provides high storage density, but suffers from significant performance degradation due to a large number of read-retry operations. Although the read-retry mechanism is…

Hardware Architecture · Computer Science 2021-04-21 Jisung Park , Myungsuk Kim , Myoungjun Chun , Lois Orosa , Jihong Kim , Onur Mutlu

Reducing Solid-State Drive Read Latency by Optimizing Read-Retry (Extended Abstract)

3D NAND flash memory with advanced multi-level cell techniques provides high storage density, but suffers from significant performance degradation due to a large number of read-retry operations. Although the read-retry mechanism is…

Hardware Architecture · Computer Science 2021-03-15 Jisung Park , Myungsuk Kim , Myoungjun Chun , Lois Orosa , Jihong Kim , Onur Mutlu

Efficient Instruction Scheduling using Real-time Load Delay Tracking

Many hardware structures in today's high-performance out-of-order processors do not scale in an efficient way. To address this, different solutions have been proposed that build execution schedules in an energy-efficient manner. Issue time…

Hardware Architecture · Computer Science 2021-09-08 Andreas Diavastos , Trevor E. Carlson

ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance

The demand for efficient large language model (LLM) inference has propelled the development of dedicated accelerators. As accelerators are vulnerable to hardware faults due to aging, variation, etc, existing accelerator designs often…

Hardware Architecture · Computer Science 2025-04-08 Tong Xie , Jiawang Zhao , Zishen Wan , Zuodong Zhang , Yuan Wang , Runsheng Wang , Ru Huang , Meng Li

A Full-Stack Search Technique for Domain Optimized Deep Learning Accelerators

The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full-stack Accelerator Search Technique (FAST), a hardware…

Machine Learning · Computer Science 2022-02-02 Dan Zhang , Safeen Huda , Ebrahim Songhori , Kartik Prabhu , Quoc Le , Anna Goldie , Azalia Mirhoseini

READER: Retrieval-Assisted Drafter for Efficient LLM Inference

Autoregressive Language Models instantiate a factorized likelihood over token sequences, yet their strictly sequential decoding process imposes an intrinsic lower bound on inference latency. This bottleneck has emerged as a central obstacle…

Computation and Language · Computer Science 2025-09-30 Maxim Divilkovskiy , Vitaly Malygin , Sergey Zlobin , Stanislav Ilyushin , Sultan Isali , Vasily Kalugin , Nuriza Aitassova , Fei Yi , Weidi Zeng

Cross-Layer Optimization for Fault-Tolerant Deep Learning

Fault-tolerant deep learning accelerator is the basis for highly reliable deep learning processing and critical to deploy deep learning in safety-critical applications such as avionics and robotics. Since deep learning is known to be…

Hardware Architecture · Computer Science 2023-12-22 Qing Zhang , Cheng Liu , Bo Liu , Haitong Huang , Ying Wang , Huawei Li , Xiaowei Li

RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning

The massive scale of modern AI accelerators presents critical challenges to traditional fault assessment methodologies, which face prohibitive computational costs and provide poor coverage of critical failure modes. This paper introduces…

Artificial Intelligence · Computer Science 2025-12-11 Khurram Khalil , Muhammad Mahad Khaliq , Khaza Anuarul Hoque

Look-Ahead AC Optimal Power Flow: A Model-Informed Reinforcement Learning Approach

With the increasing proportion of renewable energy in the generation side, it becomes more difficult to accurately predict the power generation and adapt to the large deviations between the optimal dispatch scheme and the day-ahead…

Systems and Control · Electrical Eng. & Systems 2023-03-07 Xinyue Wang , Haiwang Zhong , Guanglun Zhang , Guangchun Ruan , Yiliu He , Zekuan Yu

Accelerator-Aware Training for Transducer-Based Speech Recognition

Machine learning model weights and activations are represented in full-precision during training. This leads to performance degradation in runtime when deployed on neural network accelerator (NNA) chips, which leverage highly parallelized…

Machine Learning · Computer Science 2023-05-16 Suhaila M. Shakiah , Rupak Vignesh Swaminathan , Hieu Duy Nguyen , Raviteja Chinta , Tariq Afzal , Nathan Susanj , Athanasios Mouchtaris , Grant P. Strimel , Ariya Rastrow

Optimizing Logical Execution Time Model for Both Determinism and Low Latency

The Logical Execution Time (LET) programming model has recently received considerable attention, particularly because of its timing and dataflow determinism. In LET, task computation appears always to take the same amount of time (called…

Systems and Control · Electrical Eng. & Systems 2024-03-11 Sen Wang , Dong Li , Ashrarul H. Sifat , Shao-Yu Huang , Xuanliang Deng , Changhee Jung , Ryan Williams , Haibo Zeng

Leveraging Error Resilience of Iterative Algorithms for Energy Efficiency: from Concept to Implementation

Iterative algorithms are widely used in digital signal processing applications. With the case study of radio astronomy calibration processing, this work contributes towards revealing and exploiting the intrinsic error resilience of…

Signal Processing · Electrical Eng. & Systems 2025-02-21 G. A. Gillani , A. Krapukhin , A. B. J. Kokkeler

MatrixFlow: System-Accelerator co-design for high-performance transformer applications

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza

DeepOHeat-v1: Efficient Operator Learning for Fast and Trustworthy Thermal Simulation and Optimization in 3D-IC Design

Thermal analysis is crucial in 3D-IC design due to increased power density and complex heat dissipation paths. Although operator learning frameworks such as DeepOHeat~\cite{liu2023deepoheat} have demonstrated promising preliminary results…

Machine Learning · Computer Science 2025-10-13 Xinling Yu , Ziyue Liu , Hai Li , Yixing Li , Xin Ai , Zhiyu Zeng , Ian Young , Zheng Zhang