Related papers: Vision Transformer Pruning
This is a further development of Vision Transformer Pruning via matrix decomposition. The purpose of the Vision Transformer Pruning is to prune the dimension of the linear projection of the dataset by learning their associated importance…
It has been shown by many researchers that transformers perform as well as convolutional neural networks in many computer vision tasks. Meanwhile, the large computational costs of its attention module hinder further studies and applications…
Vision transformers (ViT) have recently attracted considerable attentions, but the huge computational cost remains an issue for practical deployment. Previous ViT pruning methods tend to prune the model along one dimension solely, which may…
Transformer architecture has gained popularity due to its ability to scale with large dataset. Consequently, there is a need to reduce the model size and latency, especially for on-device deployment. We focus on vision transformer proposed…
The remarkable performance of modern deep neural networks (DNNs) is largely driven by their massive scale, often comprising tens to hundreds of millions-or even billions-of parameters. However, such a scale incurs substantial storage and…
The Vision Transformer architecture is a deep learning model inspired by the success of the Transformer model in Natural Language Processing. However, the self-attention mechanism, large number of parameters, and the requirement for a…
Neural networks grow vastly in size to tackle more sophisticated tasks. In many cases, such large networks are not deployable on particular hardware and need to be reduced in size. Pruning techniques help to shrink deep neural networks to…
Convolutional neural networks (CNNs) have demonstrated extraordinarily good performance in many computer vision tasks. The increasing size of CNN models, however, prevents them from being widely deployed to devices with limited…
Recent advances in pruning of neural networks have made it possible to remove a large number of filters or weights without any perceptible drop in accuracy. The number of parameters and that of FLOPs are usually the reported metrics to…
Visual token compression is critical for Large Vision-Language Models (LVLMs) to efficiently process high-resolution inputs. Existing methods that typically adopt fixed compression ratios cannot adapt to scenes of varying complexity, often…
This paper studies the efficiency problem for visual transformers by excavating redundant calculation in given networks. The recent transformer architecture has demonstrated its effectiveness for achieving excellent performance on a series…
After their initial success in natural language processing, transformer architectures have rapidly gained traction in computer vision, providing state-of-the-art results for tasks such as image classification, detection, segmentation, and…
Vision transformer (ViT) has achieved competitive accuracy on a variety of computer vision applications, but its computational cost impedes the deployment on resource-limited mobile devices. We explore the sparsity in ViT and observe that…
Pruning is widely used to reduce the complexity of deep learning models, but its effects on interpretability and representation learning remain poorly understood. This paper investigates how pruning influences vision models across three key…
Vision Transformers (ViTs) have shown impressive performance in computer vision, but their high computational cost, quadratic in the number of tokens, limits their adoption in computation-constrained applications. However, this large number…
The rapid development of large-scale deep learning models questions the affordability of hardware platforms, which necessitates the pruning to reduce their computational and memory footprints. Sparse neural networks as the product, have…
Vision Transformer have set new benchmarks in several tasks, but these models come with the lack of high computational costs which makes them impractical for resource limited hardware. Network pruning reduces the computational complexity by…
Structured pruning greatly eases the deployment of large neural networks in resource-constrained environments. However, current methods either involve strong domain expertise, require extra hyperparameter tuning, or are restricted only to a…
Large vision transformers present impressive scalability, as their performance can be well improved with increased model capacity. Nevertheless, their cumbersome parameters results in exorbitant computational and memory demands. By…
Recently vision transformer models have become prominent models for a range of tasks. These models, however, usually suffer from intensive computational costs and heavy memory requirements, making them impractical for deployment on edge…