Computation and Language · Computer Science
HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation
Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang +1
2023-05-12
Computation and Language · Computer Science
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva
2024-06-21
Computation and Language · Computer Science
Transkimmer: Transformer Learns to Layer-wise Skim
Yue Guan, Zhengyi Li, Jingwen Leng, Zhouhan Lin +1
2022-05-17
Machine Learning · Computer Science
Bag of Tricks for Optimizing Transformer Efficiency
Ye Lin, Yanyang Li, Tong Xiao, Jingbo Zhu
2021-09-10
Computation and Language · Computer Science
What's Hidden in a One-layer Randomly Weighted Transformer?
Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer +1
2021-09-10
Computation and Language · Computer Science
Token Dropping for Efficient BERT Pretraining
Le Hou, Richard Yuanzhe Pang, Tianyi Zhou, Yuexin Wu +3
2022-03-25
Computation and Language · Computer Science
Transformer Layers as Painters
Qi Sun, Marc Pickett, Aakash Kumar Nain, Llion Jones
2025-02-14
Machine Learning · Computer Science
You Do Not Fully Utilize Transformer's Representation Capacity
Gleb Gerasimov, Yaroslav Aksenov, Nikita Balagansky, Viacheslav Sinii +1
2025-05-29
Machine Learning · Computer Science
SparseSwaps: Tractable LLM Pruning Mask Refinement at Scale
Max Zimmer, Christophe Roux, Moritz Wagner, Deborah Hendrych +1
2026-02-03
Computation and Language · Computer Science
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae, Adam Fisch, Hrayr Harutyunyan, Ziwei Ji +2
2025-03-03
Machine Learning · Computer Science
Brainformers: Trading Simplicity for Efficiency
Yanqi Zhou, Nan Du, Yanping Huang, Daiyi Peng +11
2024-04-26
Computation and Language · Computer Science
Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing
David Peer, Sebastian Stabinger, Stefan Engl, Antonio Rodriguez-Sanchez
2022-03-30
Artificial Intelligence · Computer Science
A transformer architecture alteration to incentivise externalised reasoning
Elizabeth Pavlova, Mariia Koroliuk, Karthik Viswanathan, Cameron Tice +2
2026-03-25
Computation and Language · Computer Science
Learned Token Pruning for Transformers
Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami +3
2022-06-06
Computation and Language · Computer Science
Trainable Transformer in Transformer
Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia, Sanjeev Arora
2024-02-09
Computation and Language · Computer Science
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
Chin-Lun Fu, Zih-Ching Chen, Yun-Ru Lee, Hung-yi Lee
2023-01-31
Computation and Language · Computer Science
On the Effect of Dropping Layers of Pre-trained Transformer Models
Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov
2022-08-16