English
Related papers

Related papers: RepNeXt: A Fast Multi-Scale CNN using Structural R…

200 papers

Recently, lightweight Vision Transformers (ViTs) demonstrate superior performance and lower latency, compared with lightweight Convolutional Neural Networks (CNNs), on resource-constrained mobile devices. Researchers have discovered many…

Computer Vision and Pattern Recognition · Computer Science 2024-03-15 Ao Wang , Hui Chen , Zijia Lin , Jungong Han , Guiguang Ding

The recent amalgamation of transformer and convolutional designs has led to steady improvements in accuracy and efficiency of the models. In this work, we introduce FastViT, a hybrid vision transformer architecture that obtains the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-21 Pavan Kumar Anasosalu Vasu , James Gabriel , Jeff Zhu , Oncel Tuzel , Anurag Ranjan

Vision-transformers (ViTs) and large-scale convolution-neural-networks (CNNs) have reshaped computer vision through pretrained feature representations that enable strong transfer learning for diverse tasks. However, their efficiency as…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Alon Kaya , Igal Bilik , Inna Stainvas

Light-weight convolutional neural networks (CNNs) are the de-facto for mobile vision tasks. Their spatial inductive biases allow them to learn representations with fewer parameters across different vision tasks. However, these networks are…

Computer Vision and Pattern Recognition · Computer Science 2022-03-07 Sachin Mehta , Mohammad Rastegari

Vision transformers (ViTs) have dominated computer vision in recent years. However, ViTs are computationally expensive and not well suited for mobile devices; this led to the prevalence of convolutional neural network (CNN) and ViT-based…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Mustafa Munir , Md Mostafijur Rahman , Radu Marculescu

Deploying vision models across devices with varying resource constraints, or even on a single device where available compute fluctuates due to battery state, thermal throttling, or latency deadlines, typically requires training and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Janek Haberer , Jon Eike Wilhelm , Olaf Landsiedel

The transformer model has gained widespread adoption in computer vision tasks in recent times. However, due to the quadratic time and memory complexity of self-attention, which is proportional to the number of input tokens, most existing…

Computer Vision and Pattern Recognition · Computer Science 2023-11-13 Wei Tan , Yifeng Geng , Xuansong Xie

The emergence of deep learning techniques has advanced the image segmentation task, especially for medical images. Many neural network models have been introduced in the last decade bringing the automated segmentation accuracy close to…

Image and Video Processing · Electrical Eng. & Systems 2025-03-11 Ngoc-Du Tran , Thi-Thao Tran , Quang-Huy Nguyen , Manh-Hung Vu , Van-Truong Pham

Although convolutional neural networks (CNNs) showed remarkable results in many vision tasks, they are still strained by simple yet challenging visual reasoning problems. Inspired by the recent success of the Transformer network in computer…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Nicola Messina , Giuseppe Amato , Fabio Carrara , Claudio Gennaro , Fabrizio Falchi

There are two de facto standard architectures in recent computer vision: Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). Strong inductive biases of convolutions help the model learn sample effectively, but such strong…

Computer Vision and Pattern Recognition · Computer Science 2022-10-05 Yunsung Lee , Gyuseong Lee , Kwangrok Ryoo , Hyojun Go , Jihye Park , Seungryong Kim

We reveal that feedforward network (FFN) layers, rather than attention layers, are the primary contributors to Vision Transformer (ViT) inference latency, with their impact signifying as model size increases. This finding highlights a…

Computer Vision and Pattern Recognition · Computer Science 2025-06-03 Xuwei Xu , Yang Li , Yudong Chen , Jiajun Liu , Sen Wang

In the past decade, convolutional neural networks (CNNs) have shown prominence for semantic segmentation. Although CNN models have very impressive performance, the ability to capture global representation is still insufficient, which…

Computer Vision and Pattern Recognition · Computer Science 2023-02-22 Guoan Xu , Juncheng Li , Guangwei Gao , Huimin Lu , Jian Yang , Dong Yue

With the success of Vision Transformers (ViTs) in computer vision tasks, recent arts try to optimize the performance and complexity of ViTs to enable efficient deployment on mobile devices. Multiple approaches are proposed to accelerate…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Yanyu Li , Ju Hu , Yang Wen , Georgios Evangelidis , Kamyar Salahi , Yanzhi Wang , Sergey Tulyakov , Jian Ren

This study evaluates the trade-offs between convolutional and transformer-based architectures on both medical and general-purpose image classification benchmarks. We use ResNet-18 as our baseline and introduce a fine-tuning strategy applied…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Aidar Amangeldi , Angsar Taigonyrov , Muhammad Huzaifa Jawad , Chinedu Emmanuel Mbonu

Due to the complex attention mechanisms and model design, most existing vision Transformers (ViTs) can not perform as efficiently as convolutional neural networks (CNNs) in realistic industrial deployment scenarios, e.g. TensorRT and…

Computer Vision and Pattern Recognition · Computer Science 2022-08-17 Jiashi Li , Xin Xia , Wei Li , Huixia Li , Xing Wang , Xuefeng Xiao , Rui Wang , Min Zheng , Xin Pan

The U-shaped architecture has emerged as a crucial paradigm in the design of medical image segmentation networks. However, due to the inherent local limitations of convolution, a fully convolutional segmentation network with U-shaped…

Image and Video Processing · Electrical Eng. & Systems 2023-08-04 Fenghe Tang , Jianrui Ding , Lingtao Wang , Chunping Ning , S. Kevin Zhou

Recent advances on Vision Transformer (ViT) and its improved variants have shown that self-attention-based networks surpass traditional Convolutional Neural Networks (CNNs) in most vision tasks. However, existing ViTs focus on the standard…

Computer Vision and Pattern Recognition · Computer Science 2022-05-24 Xiaofeng Mao , Gege Qi , Yuefeng Chen , Xiaodan Li , Ranjie Duan , Shaokai Ye , Yuan He , Hui Xue

Recent research has successfully adapted vision-based convolutional neural network (CNN) architectures for audio recognition tasks using Mel-Spectrograms. However, these CNNs have high computational costs and memory requirements, limiting…

Sound · Computer Science 2024-04-23 Kin Wai Lau , Yasar Abbas Ur Rehman , Lai-Man Po

Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) are two dominant models for image analysis. While CNNs excel at extracting multi-scale features and ViTs effectively capture global dependencies, both suffer from high…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Shicheng Yin , Kaixuan Yin , Weixing Chen , Enbo Huang , Yang Liu

Due to the advent of modern embedded systems and mobile devices with constrained resources, there is a great demand for incredibly efficient deep neural networks for machine learning purposes. There is also a growing concern of privacy and…

Computer Vision and Pattern Recognition · Computer Science 2021-12-02 Priyank Kalgaonkar , Mohamed El-Sharkawy
‹ Prev 1 2 3 10 Next ›