English

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

Computation and Language 2024-04-05 v1 Machine Learning Neural and Evolutionary Computing

Abstract

Transformers are ubiquitous in Natural Language Processing (NLP) tasks, but they are difficult to be deployed on hardware due to the intensive computation. To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search. We first construct a large design space with arbitrary encoder-decoder attention\textit{arbitrary encoder-decoder attention} and heterogeneous layers\textit{heterogeneous layers}. Then we train a SuperTransformer\textit{SuperTransformer} that covers all candidates in the design space, and efficiently produces many SubTransformers\textit{SubTransformers} with weight sharing. Finally, we perform an evolutionary search with a hardware latency constraint to find a specialized SubTransformer\textit{SubTransformer} dedicated to run fast on the target hardware. Extensive experiments on four machine translation tasks demonstrate that HAT can discover efficient models for different hardware (CPU, GPU, IoT device). When running WMT'14 translation task on Raspberry Pi-4, HAT can achieve 3×\textbf{3}\times speedup, 3.7×\textbf{3.7}\times smaller size over baseline Transformer; 2.7×\textbf{2.7}\times speedup, 3.6×\textbf{3.6}\times smaller size over Evolved Transformer with 12,041×\textbf{12,041}\times less search cost and no performance loss. HAT code is https://github.com/mit-han-lab/hardware-aware-transformers.git

Keywords

Cite

@article{arxiv.2005.14187,
  title  = {HAT: Hardware-Aware Transformers for Efficient Natural Language Processing},
  author = {Hanrui Wang and Zhanghao Wu and Zhijian Liu and Han Cai and Ligeng Zhu and Chuang Gan and Song Han},
  journal= {arXiv preprint arXiv:2005.14187},
  year   = {2024}
}

Comments

Accepted to ACL 2020. 14 pages, 12 figures. Code available at http://github.com/mit-han-lab/hardware-aware-transformers.git

R2 v1 2026-06-23T15:53:34.469Z