English
Related papers

Related papers: READ: Reliability-Enhanced Accelerator Dataflow Op…

200 papers

Fully fine-tuning pretrained large-scale transformer models has become a popular paradigm for video-language modeling tasks, such as temporal language grounding and video-language summarization. With a growing number of tasks and limited…

Computer Vision and Pattern Recognition · Computer Science 2026-05-13 Thong Nguyen , Xiaobao Wu , Xinshuai Dong , Khoi Le , Zhiyuan Hu , Cong-Duy Nguyen , See-Kiong Ng , Luu Anh Tuan

Fine-tuning large-scale Transformers has led to the explosion of many AI applications across Natural Language Processing and Computer Vision tasks. However, fine-tuning all pre-trained model parameters becomes impractical as the model size…

Machine Learning · Computer Science 2024-10-07 John Nguyen , Sid Wang , Ke Li , Carole-Jean Wu

State-of-the-art DRAM read disturbance mitigations rely on the read disturbance threshold (RDT) (e.g., the number of aggressor row activations needed to induce the first read disturbance bitflip) to securely and performance- and…

Hardware Architecture · Computer Science 2026-03-16 Ataberk Olgun , F. Nisa Bostanci , Ismail Emir Yuksel , Haocong Luo , Minesh Patel , A. Giray Yaglikci , Onur Mutlu

In modern Very Large Scale Integrated (VLSI) circuit design flow, the Register-Transfer Level (RTL) stage presents a critical opportunity for timing optimization. Addressing timing violations at this early stage is essential, as modern…

Hardware Architecture · Computer Science 2025-08-20 Wenhao Lv , Yingjie Xia , Xiyuan Chen , Li Kuang

The introduction of diffusion models has brought significant advances to the field of audio-driven talking head generation. However, the extremely slow inference speed severely limits the practical implementation of diffusion-based talking…

Graphics · Computer Science 2025-11-18 Haotian Wang , Yuzhe Weng , Jun Du , Haoran Xu , Xiaoyan Wu , Shan He , Bing Yin , Cong Liu , Jianqing Gao , Qingfeng Liu

The inference of ML models composed of diverse structures, types, and sizes boils down to the execution of different dataflows (i.e. different tiling, ordering, parallelism, and shapes). Using the optimal dataflow for every layer of…

Hardware Architecture · Computer Science 2026-04-07 Jianming Tong , Anirudh Itagi , Prasanth Chatarasi , Tushar Krishna

3D NAND flash memory with advanced multi-level cell techniques provides high storage density, but suffers from significant performance degradation due to a large number of read-retry operations. Although the read-retry mechanism is…

Hardware Architecture · Computer Science 2021-04-21 Jisung Park , Myungsuk Kim , Myoungjun Chun , Lois Orosa , Jihong Kim , Onur Mutlu

3D NAND flash memory with advanced multi-level cell techniques provides high storage density, but suffers from significant performance degradation due to a large number of read-retry operations. Although the read-retry mechanism is…

Hardware Architecture · Computer Science 2021-03-15 Jisung Park , Myungsuk Kim , Myoungjun Chun , Lois Orosa , Jihong Kim , Onur Mutlu

Many hardware structures in today's high-performance out-of-order processors do not scale in an efficient way. To address this, different solutions have been proposed that build execution schedules in an energy-efficient manner. Issue time…

Hardware Architecture · Computer Science 2021-09-08 Andreas Diavastos , Trevor E. Carlson

The demand for efficient large language model (LLM) inference has propelled the development of dedicated accelerators. As accelerators are vulnerable to hardware faults due to aging, variation, etc, existing accelerator designs often…

Hardware Architecture · Computer Science 2025-04-08 Tong Xie , Jiawang Zhao , Zishen Wan , Zuodong Zhang , Yuan Wang , Runsheng Wang , Ru Huang , Meng Li

The rapidly-changing deep learning landscape presents a unique opportunity for building inference accelerators optimized for specific datacenter-scale workloads. We propose Full-stack Accelerator Search Technique (FAST), a hardware…

Machine Learning · Computer Science 2022-02-02 Dan Zhang , Safeen Huda , Ebrahim Songhori , Kartik Prabhu , Quoc Le , Anna Goldie , Azalia Mirhoseini

Autoregressive Language Models instantiate a factorized likelihood over token sequences, yet their strictly sequential decoding process imposes an intrinsic lower bound on inference latency. This bottleneck has emerged as a central obstacle…

Computation and Language · Computer Science 2025-09-30 Maxim Divilkovskiy , Vitaly Malygin , Sergey Zlobin , Stanislav Ilyushin , Sultan Isali , Vasily Kalugin , Nuriza Aitassova , Fei Yi , Weidi Zeng

Fault-tolerant deep learning accelerator is the basis for highly reliable deep learning processing and critical to deploy deep learning in safety-critical applications such as avionics and robotics. Since deep learning is known to be…

Hardware Architecture · Computer Science 2023-12-22 Qing Zhang , Cheng Liu , Bo Liu , Haitong Huang , Ying Wang , Huawei Li , Xiaowei Li

The massive scale of modern AI accelerators presents critical challenges to traditional fault assessment methodologies, which face prohibitive computational costs and provide poor coverage of critical failure modes. This paper introduces…

Artificial Intelligence · Computer Science 2025-12-11 Khurram Khalil , Muhammad Mahad Khaliq , Khaza Anuarul Hoque

With the increasing proportion of renewable energy in the generation side, it becomes more difficult to accurately predict the power generation and adapt to the large deviations between the optimal dispatch scheme and the day-ahead…

Systems and Control · Electrical Eng. & Systems 2023-03-07 Xinyue Wang , Haiwang Zhong , Guanglun Zhang , Guangchun Ruan , Yiliu He , Zekuan Yu

Machine learning model weights and activations are represented in full-precision during training. This leads to performance degradation in runtime when deployed on neural network accelerator (NNA) chips, which leverage highly parallelized…

The Logical Execution Time (LET) programming model has recently received considerable attention, particularly because of its timing and dataflow determinism. In LET, task computation appears always to take the same amount of time (called…

Systems and Control · Electrical Eng. & Systems 2024-03-11 Sen Wang , Dong Li , Ashrarul H. Sifat , Shao-Yu Huang , Xuanliang Deng , Changhee Jung , Ryan Williams , Haibo Zeng

Iterative algorithms are widely used in digital signal processing applications. With the case study of radio astronomy calibration processing, this work contributes towards revealing and exploiting the intrinsic error resilience of…

Signal Processing · Electrical Eng. & Systems 2025-02-21 G. A. Gillani , A. Krapukhin , A. B. J. Kokkeler

Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…

Hardware Architecture · Computer Science 2025-03-10 Qunyou Liu , Marina Zapater , David Atienza

Thermal analysis is crucial in 3D-IC design due to increased power density and complex heat dissipation paths. Although operator learning frameworks such as DeepOHeat~\cite{liu2023deepoheat} have demonstrated promising preliminary results…

Machine Learning · Computer Science 2025-10-13 Xinling Yu , Ziyue Liu , Hai Li , Yixing Li , Xin Ai , Zhiyu Zeng , Ian Young , Zheng Zhang
‹ Prev 1 2 3 10 Next ›