English
Related papers

Related papers: Learned Token Pruning for Transformers

200 papers

This paper presents a novel differentiable method for unstructured weight pruning of deep neural networks. Our learned-threshold pruning (LTP) method learns per-layer thresholds via gradient descent, unlike conventional methods where they…

Machine Learning · Computer Science 2021-03-22 Kambiz Azarian , Yash Bhalgat , Jinwon Lee , Tijmen Blankevoort

Vision transformers have demonstrated remarkable success in a wide range of computer vision tasks over the last years. However, their high computational costs remain a significant barrier to their practical deployment. In particular, the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Maxim Bonnaerens , Joni Dambre

Recently, large language models (LLMs) have demonstrated superior performance across various tasks by adhering to scaling laws, which significantly increase model size. However, the huge computation overhead during inference hinders the…

Computation and Language · Computer Science 2024-12-17 Zekai Li , Jintu Zheng , Ji Liu , Han Liu , Haowei Zhu , Zeping Li , Fuwei Yang , Haiduo Huang , Jinzhang Peng , Dong Li , Lu Tian , Emad Barsoum

Deploying pre-trained transformer models like BERT on downstream tasks in resource-constrained scenarios is challenging due to their high inference cost, which grows rapidly with input sequence length. In this work, we propose a…

Computation and Language · Computer Science 2023-06-27 Junyan Li , Li Lyna Zhang , Jiahang Xu , Yujing Wang , Shaoguang Yan , Yunqing Xia , Yuqing Yang , Ting Cao , Hao Sun , Weiwei Deng , Qi Zhang , Mao Yang

Large Vision-Language Models (LVLMs) have recently demonstrated strong multimodal understanding, yet their fine-grained visual perception is often constrained by low input resolutions. A common remedy is to partition high-resolution images…

Computer Vision and Pattern Recognition · Computer Science 2026-02-17 Yuxuan Liang , Xu Li , Xiaolei Chen , Yi Zheng , Haotian Chen , Bin Li , Xiangyang Xue

Despite the recent success of large language models (LLMs), LLMs are particularly challenging in long-sequence inference scenarios due to the quadratic computational complexity of the attention mechanism. Inspired by the interpretability…

Computation and Language · Computer Science 2025-04-10 Yao Tao , Yehui Tang , Yun Wang , Mingjian Zhu , Hailin Hu , Yunhe Wang

Large Vision-Language Models (LVLMs) have shown impressive performance across multi-modal tasks by encoding images into thousands of tokens. However, the large number of image tokens results in significant computational overhead, and the…

Computer Vision and Pattern Recognition · Computer Science 2025-10-24 Kaiyuan Li , Xiaoyue Chen , Chen Gao , Yong Li , Xinlei Chen

Large Vision Language Models (LVLMs) have achieved significant success across multi-modal tasks. However, the computational cost of processing long visual tokens can be prohibitively expensive on resource-limited devices. Previous methods…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Xubing Ye , Yukang Gan , Yixiao Ge , Xiao-Ping Zhang , Yansong Tang

In modern large language models (LLMs), increasing the context length is crucial for improving comprehension and coherence in long-context, multi-modal, and retrieval-augmented language generation. While many recent transformer models…

Computation and Language · Computer Science 2025-01-24 Heejun Lee , Geon Park , Youngwan Lee , Jaduk Suh , Jina Kim , Wonyoung Jeong , Bumsik Kim , Hyemin Lee , Myeongjae Jeon , Sung Ju Hwang

Large-scale transformer models have become the de-facto architectures for various machine learning applications, e.g., CV and NLP. However, those large models also introduce prohibitive training costs. To mitigate this issue, we propose a…

Computation and Language · Computer Science 2022-11-22 Zhewei Yao , Xiaoxia Wu , Conglong Li , Connor Holmes , Minjia Zhang , Cheng Li , Yuxiong He

Large Vision Language Models show impressive performance across image and video understanding tasks, yet their computational cost grows rapidly with the number of visual tokens. Existing token pruning methods mitigate this issue through…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Dong-Jae Lee , Sunghyun Baek , Junmo Kim

Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens…

Computation and Language · Computer Science 2024-06-03 Sotiris Anagnostidis , Dario Pavllo , Luca Biggio , Lorenzo Noci , Aurelien Lucchi , Thomas Hofmann

As software projects rapidly evolve, software artifacts become more complex and defects behind get harder to identify. The emerging Transformer-based approaches, though achieving remarkable performance, struggle with long code sequences due…

Software Engineering · Computer Science 2024-09-13 Xueqi Yang , Mariusz Jakubowski , Li Kang , Haojie Yu , Tim Menzies

Large Vision-Language Models (LVLMs) have advanced multimodal learning but face high computational costs due to the large number of visual tokens, motivating token pruning to improve inference efficiency. The key challenge lies in…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Ao Li , Yuxiang Duan , Jinghui Zhang , Congbo Ma , Yutong Xie , Gustavo Carneiro , Mohammad Yaqub , Hu Wang

Private Transformer inference using cryptographic protocols offers promising solutions for privacy-preserving machine learning; however, it still faces significant runtime overhead (efficiency issues) and challenges in handling long-token…

Machine Learning · Computer Science 2025-03-07 Yancheng Zhang , Jiaqi Xue , Mengxin Zheng , Mimi Xie , Mingzhe Zhang , Lei Jiang , Qian Lou

Due to the prevalence of large language models (LLMs), key-value (KV) cache reduction for LLM inference has received remarkable attention. Among numerous works that have been proposed in recent years, layer-wise token pruning approaches,…

Computation and Language · Computer Science 2026-04-17 Rei Taniguchi , Yuyang Dong , Makoto Onizuka , Chuan Xiao

Vision Transformers (ViTs) have emerged as the backbone of many segmentation models, consistently achieving state-of-the-art (SOTA) performance. However, their success comes at a significant computational cost. Image token pruning is one of…

Computer Vision and Pattern Recognition · Computer Science 2024-12-02 Hanning Chen , Yang Ni , Wenjun Huang , Yezi Liu , SungHeon Jeong , Fei Wen , Nathaniel Bastian , Hugo Latapie , Mohsen Imani

We present LightVLA, a simple yet effective differentiable token pruning framework for vision-language-action (VLA) models. While VLA models have shown impressive capability in executing real-world robotic tasks, their deployment on…

Robotics · Computer Science 2025-09-23 Titong Jiang , Xuefeng Jiang , Yuan Ma , Xin Wen , Bailin Li , Kun Zhan , Peng Jia , Yahui Liu , Sheng Sun , Xianpeng Lang

Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. However, current pruning algorithms either only focus on one pruning category, e.g., structured…

Computation and Language · Computer Science 2022-05-24 Zhewei Yao , Xiaoxia Wu , Linjian Ma , Sheng Shen , Kurt Keutzer , Michael W. Mahoney , Yuxiong He

Large language models (LLMs) demonstrate strong performance as text embedding models when finetuned with supervised contrastive training. However, their large size balloons inference time and memory requirements. In this paper, we show that…

Computation and Language · Computer Science 2024-10-21 Thennal D K , Tim Fischer , Chris Biemann
‹ Prev 1 2 3 10 Next ›