English
Related papers

Related papers: HAT: Hardware-Aware Transformers for Efficient Nat…

200 papers

The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at…

Computation and Language · Computer Science 2021-08-03 Hishan Parry , Lei Xun , Amin Sabet , Jia Bi , Jonathon Hare , Geoff V. Merrett

Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint…

Machine Learning · Computer Science 2024-03-06 Yash Akhauri , Mohamed S. Abdelfattah

For deployment, neural architecture search should be hardware-aware, in order to satisfy the device-specific constraints (e.g., memory usage, latency and energy consumption) and enhance the model efficiency. Existing methods on…

Machine Learning · Computer Science 2021-12-03 Hayeon Lee , Sewoong Lee , Song Chong , Sung Ju Hwang

This paper presents a performance study of transformer language models under different hardware configurations and accuracy requirements and derives empirical observations about these resource/accuracy trade-offs. In particular, we study…

Computation and Language · Computer Science 2024-03-08 Souvika Sarkar , Mohammad Fakhruddin Babar , Md Mahadi Hassan , Monowar Hasan , Shubhra Kanti Karmaker Santu

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to these approaches compared to the original Transformer in terms…

Computation and Language · Computer Science 2022-10-12 Ilias Chalkidis , Xiang Dai , Manos Fergadiotis , Prodromos Malakasiotis , Desmond Elliott

Transformer architectures have achieved state-of-the-art performance across natural language tasks, yet they fundamentally misrepresent the hierarchical nature of human language by processing text as flat token sequences. This results in…

Computation and Language · Computer Science 2025-09-26 Ayan Sar , Sampurna Roy , Kanav Gupta , Anurag Kaushish , Tanupriya Choudhury , Abhijit Kumar

Recent advancements in large language models (LLMs) have catalyzed a substantial surge in demand for LLM services. While traditional cloud-based LLM services satisfy high-accuracy requirements, they fall short in meeting critical demands…

Machine Learning · Computer Science 2025-03-26 Zuan Xie , Yang Xu , Hongli Xu , Yunming Liao , Zhiwei Yao

Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be…

Machine Learning · Computer Science 2023-04-14 Zirui Fu , Aleksandre Avaliani , Marco Donato

Deformable Attention Transformers (DAT) have shown remarkable performance in computer vision tasks by adaptively focusing on informative image regions. However, their data-dependent sampling mechanism introduces irregular memory access…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Wendong Mao , Mingfan Zhao , Jianfeng Guan , Qiwei Dong , Zhongfeng Wang

Large Language Models (LLMs) have emerged as a pivotal research area, yet the attention module remains a critical bottleneck in LLM inference, even with techniques like KVCache to mitigate redundant computations. While various top-$k$…

Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a…

Machine Learning · Computer Science 2022-12-01 Jiaqi Gu , Ben Keller , Jean Kossaifi , Anima Anandkumar , Brucek Khailany , David Z. Pan

In the realm of neural architecture design, achieving high performance is largely reliant on the manual expertise of researchers. Despite the emergence of Neural Architecture Search (NAS) as a promising technique for automating this…

Machine Learning · Computer Science 2025-01-07 Yannis Y. He

Human Activity Recognition (HAR) on mobile devices has been demonstrated to be possible using neural models trained on data collected from the device's inertial measurement units. These models have used Convolutional Neural Networks (CNNs),…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Sannara EK , François Portet , Philippe Lalanda

Following their success in natural language processing (NLP), there has been a shift towards transformer models in computer vision. While transformers perform well and offer promising multi-tasking performance, due to their high compute…

Artificial Intelligence · Computer Science 2025-10-02 Maximilian Augustin , Syed Shakib Sarwar , Mostafa Elhoushi , Sai Qian Zhang , Yuecheng Li , Barbara De Salvo

The Transformer architecture revolutionized the field of natural language processing (NLP). Transformers-based models (e.g., BERT) power many important Web services, such as search, translation, question-answering, etc. While enormous…

Computation and Language · Computer Science 2021-02-23 Dave Dice , Alex Kogan

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying…

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Lotfi Abdelkrim Mecharbat , Hadjer Benmeziane , Hamza Ouarnoughi , Smail Niar

Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. However, we find that these networks can only utilize a limited spatial range of input information through…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Xiangyu Chen , Xintao Wang , Wenlong Zhang , Xiangtao Kong , Yu Qiao , Jiantao Zhou , Chao Dong

Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN inference still comes at a high computational cost. A growing body of work aims to alleviate this by…

Computer Vision and Pattern Recognition · Computer Science 2020-08-11 Stefanos Laskaridis , Stylianos I. Venieris , Hyeji Kim , Nicholas D. Lane
‹ Prev 1 2 3 10 Next ›