Related papers: HAT: Hardware-Aware Transformers for Efficient Nat…

Dynamic Transformer for Efficient Machine Translation on Embedded Devices

The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at…

Computation and Language · Computer Science 2021-08-03 Hishan Parry , Lei Xun , Amin Sabet , Jia Bi , Jonathon Hare , Geoff V. Merrett

On Latency Predictors for Neural Architecture Search

Efficient deployment of neural networks (NN) requires the co-optimization of accuracy and latency. For example, hardware-aware neural architecture search has been used to automatically find NN architectures that satisfy a latency constraint…

Machine Learning · Computer Science 2024-03-06 Yash Akhauri , Mohamed S. Abdelfattah

HELP: Hardware-Adaptive Efficient Latency Prediction for NAS via Meta-Learning

For deployment, neural architecture search should be hardware-aware, in order to satisfy the device-specific constraints (e.g., memory usage, latency and energy consumption) and enhance the model efficiency. Existing methods on…

Machine Learning · Computer Science 2021-12-03 Hayeon Lee , Sewoong Lee , Song Chong , Sung Ju Hwang

Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform?

This paper presents a performance study of transformer language models under different hardware configurations and accuracy requirements and derives empirical observations about these resource/accuracy trade-offs. In particular, we study…

Computation and Language · Computer Science 2024-03-08 Souvika Sarkar , Mohammad Fakhruddin Babar , Md Mahadi Hassan , Monowar Hasan , Shubhra Kanti Karmaker Santu

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents. There are clear benefits to these approaches compared to the original Transformer in terms…

Computation and Language · Computer Science 2022-10-12 Ilias Chalkidis , Xiang Dai , Manos Fergadiotis , Prodromos Malakasiotis , Desmond Elliott

Hierarchical Resolution Transformers: A Wavelet-Inspired Architecture for Multi-Scale Language Understanding

Transformer architectures have achieved state-of-the-art performance across natural language tasks, yet they fundamentally misrepresent the hierarchical nature of human language by processing text as flat token sequences. This results in…

Computation and Language · Computer Science 2025-09-26 Ayan Sar , Sampurna Roy , Kanav Gupta , Anurag Kaushish , Tanupriya Choudhury , Abhijit Kumar

A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models

Recent advancements in large language models (LLMs) have catalyzed a substantial surge in demand for LLM services. While traditional cloud-based LLM services satisfy high-accuracy requirements, they fall short in meeting critical demands…

Machine Learning · Computer Science 2025-03-26 Zuan Xie , Yang Xu , Hongli Xu , Yunming Liao , Zhiwei Yao

Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures

Executing machine learning inference tasks on resource-constrained edge devices requires careful hardware-software co-design optimizations. Recent examples have shown how transformer-based deep neural network models such as ALBERT can be…

Machine Learning · Computer Science 2023-04-14 Zirui Fu , Aleksandre Avaliani , Marco Donato

A Memory-Efficient Framework for Deformable Transformer with Neural Architecture Search

Deformable Attention Transformers (DAT) have shown remarkable performance in computer vision tasks by adaptively focusing on informative image regions. However, their data-dependent sampling mechanism introduces irregular memory access…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Wendong Mao , Mingfan Zhao , Jianfeng Guan , Qiwei Dong , Zhongfeng Wang

HATA: Trainable and Hardware-Efficient Hash-Aware Top-k Attention for Scalable Large Model Inference

Large Language Models (LLMs) have emerged as a pivotal research area, yet the attention module remains a critical bottleneck in LLM inference, even with techniques like KVCache to mitigate redundant computations. While various top-$k$…

Machine Learning · Computer Science 2025-06-04 Ping Gong , Jiawei Yi , Shengnan Wang , Juncheng Zhang , Zewen Jin , Ouxiang Zhou , Ruibo Liu , Guanbin Xu , Youhui Bai , Bowen Ye , Kun Yuan , Tong Yang , Gong Zhang , Renhai Chen , Feng Wu , Cheng Li

HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression

Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a…

Machine Learning · Computer Science 2022-12-01 Jiaqi Gu , Ben Keller , Jean Kossaifi , Anima Anandkumar , Brucek Khailany , David Z. Pan

TART: Token-based Architecture Transformer for Neural Network Performance Prediction

In the realm of neural architecture design, achieving high performance is largely reliant on the manual expertise of researchers. Despite the emergence of Neural Architecture Search (NAS) as a promising technique for automating this…

Machine Learning · Computer Science 2025-01-07 Yannis Y. He

Transformer-based Models to Deal with Heterogeneous Environments in Human Activity Recognition

Human Activity Recognition (HAR) on mobile devices has been demonstrated to be possible using neural models trained on data collected from the device's inertial measurement units. These models have used Convolutional Neural Networks (CNNs),…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Sannara EK , François Portet , Philippe Lalanda

PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context

Following their success in natural language processing (NLP), there has been a shift towards transformer models in computer vision. While transformers perform well and offer promising multi-tasking performance, due to their high compute…

Artificial Intelligence · Computer Science 2025-10-02 Maximilian Augustin , Syed Shakib Sarwar , Mostafa Elhoushi , Sai Qian Zhang , Yuecheng Li , Barbara De Salvo

Optimizing Inference Performance of Transformers on CPUs

The Transformer architecture revolutionized the field of natural language processing (NLP). Transformers-based models (e.g., BERT) power many important Web services, such as search, translation, question-answering, etc. While enormous…

Computation and Language · Computer Science 2021-02-23 Dave Dice , Alex Kogan

HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models

The increasing size of language models necessitates a thorough analysis across multiple dimensions to assess trade-offs among crucial hardware metrics such as latency, energy consumption, GPU memory usage, and performance. Identifying…

Machine Learning · Computer Science 2024-11-05 Rhea Sanjay Sukthanker , Arber Zela , Benedikt Staffler , Aaron Klein , Lennart Purucker , Joerg K. H. Franke , Frank Hutter

Advancements in Natural Language Processing: Exploring Transformer-Based Architectures for Text Understanding

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

HyT-NAS: Hybrid Transformers Neural Architecture Search for Edge Devices

Vision Transformers have enabled recent attention-based Deep Learning (DL) architectures to achieve remarkable results in Computer Vision (CV) tasks. However, due to the extensive computational resources required, these architectures are…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Lotfi Abdelkrim Mecharbat , Hadjer Benmeziane , Hamza Ouarnoughi , Smail Niar

HAT: Hybrid Attention Transformer for Image Restoration

Transformer-based methods have shown impressive performance in image restoration tasks, such as image super-resolution and denoising. However, we find that these networks can only utilize a limited spatial range of input information through…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Xiangyu Chen , Xintao Wang , Wenlong Zhang , Xiangtao Kong , Yu Qiao , Jiantao Zhou , Chao Dong

HAPI: Hardware-Aware Progressive Inference

Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks. Despite their popularity, CNN inference still comes at a high computational cost. A growing body of work aims to alleviate this by…

Computer Vision and Pattern Recognition · Computer Science 2020-08-11 Stefanos Laskaridis , Stylianos I. Venieris , Hyeji Kim , Nicholas D. Lane