Related papers: Container: Context Aggregation Network

MLP-Mixer: An all-MLP Architecture for Vision

Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both…

Computer Vision and Pattern Recognition · Computer Science 2021-06-14 Ilya Tolstikhin , Neil Houlsby , Alexander Kolesnikov , Lucas Beyer , Xiaohua Zhai , Thomas Unterthiner , Jessica Yung , Andreas Steiner , Daniel Keysers , Jakob Uszkoreit , Mario Lucic , Alexey Dosovitskiy

ConTNet: Why not use convolution and transformer at the same time?

Although convolutional networks (ConvNets) have enjoyed great success in computer vision (CV), it suffers from capturing global information crucial to dense prediction tasks such as object detection and segmentation. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2021-05-12 Haotian Yan , Zhe Li , Weijian Li , Changhu Wang , Ming Wu , Chuang Zhang

Contextual Attention Network: Transformer Meets U-Net

Currently, convolutional neural networks (CNN) (e.g., U-Net) have become the de facto standard and attained immense success in medical image segmentation. However, as a downside, CNN based methods are a double-edged sword as they fail to…

Image and Video Processing · Electrical Eng. & Systems 2022-04-01 Reza Azad , Moein Heidari , Yuli Wu , Dorit Merhof

Vision Conformer: Incorporating Convolutions into Vision Transformer Layers

Transformers are popular neural network models that use layers of self-attention and fully-connected nodes with embedded tokens. Vision Transformers (ViT) adapt transformers for image recognition tasks. In order to do this, the images are…

Computer Vision and Pattern Recognition · Computer Science 2023-04-28 Brian Kenji Iwana , Akihiro Kusuda

Incorporating Convolution Designs into Visual Transformers

Motivated by the success of Transformers in natural language processing (NLP) tasks, there emerge some attempts (e.g., ViT and DeiT) to apply Transformers to the vision domain. However, pure Transformer architectures often require a large…

Computer Vision and Pattern Recognition · Computer Science 2021-04-21 Kun Yuan , Shaopeng Guo , Ziwei Liu , Aojun Zhou , Fengwei Yu , Wei Wu

A Comprehensive Study of Vision Transformers on Dense Prediction Tasks

Convolutional Neural Networks (CNNs), architectures consisting of convolutional layers, have been the standard choice in vision tasks. Recent studies have shown that Vision Transformers (VTs), architectures based on self-attention modules,…

Computer Vision and Pattern Recognition · Computer Science 2022-01-24 Kishaan Jeeveswaran , Senthilkumar Kathiresan , Arnav Varma , Omar Magdy , Bahram Zonooz , Elahe Arani

Attentive Convolution: Equipping CNNs with RNN-style Attention Mechanisms

In NLP, convolutional neural networks (CNNs) have benefited less than recurrent neural networks (RNNs) from attention mechanisms. We hypothesize that this is because the attention in CNNs has been mainly implemented as attentive pooling…

Computation and Language · Computer Science 2018-11-14 Wenpeng Yin , Hinrich Schütze

Fusing Deep Convolutional Networks for Large Scale Visual Concept Classification

Deep learning architectures are showing great promise in various computer vision domains including image classification, object detection, event detection and action recognition. In this study, we investigate various aspects of…

Computer Vision and Pattern Recognition · Computer Science 2016-08-08 Hilal Ergun , Mustafa Sert

ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context

Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Wei Han , Zhengdong Zhang , Yu Zhang , Jiahui Yu , Chung-Cheng Chiu , James Qin , Anmol Gulati , Ruoming Pang , Yonghui Wu

A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships

Transformer-based models have transformed the landscape of natural language processing (NLP) and are increasingly applied to computer vision tasks with remarkable success. These models, renowned for their ability to capture long-range…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Gracile Astlin Pereira , Muhammad Hussain

Conformer: Local Features Coupling Global Representations for Visual Recognition

Within Convolutional Neural Network (CNN), the convolution operations are good at extracting local features but experience difficulty to capture global representations. Within visual transformer, the cascaded self-attention modules can…

Computer Vision and Pattern Recognition · Computer Science 2021-05-11 Zhiliang Peng , Wei Huang , Shanzhi Gu , Lingxi Xie , Yaowei Wang , Jianbin Jiao , Qixiang Ye

MAC-ReconNet: A Multiple Acquisition Context based Convolutional Neural Network for MR Image Reconstruction using Dynamic Weight Prediction

Convolutional Neural network-based MR reconstruction methods have shown to provide fast and high quality reconstructions. A primary drawback with a CNN-based model is that it lacks flexibility and can effectively operate only for a specific…

Image and Video Processing · Electrical Eng. & Systems 2022-03-11 Sriprabha Ramanarayanan , Balamurali Murugesan , Keerthi Ram , Mohanasankar Sivaprakasam

MHITNet: a minimize network with a hierarchical context-attentional filter for segmenting medical ct images

In the field of medical CT image processing, convolutional neural networks (CNNs) have been the dominant technique.Encoder-decoder CNNs utilise locality for efficiency, but they cannot simulate distant pixel interactions properly.Recent…

Image and Video Processing · Electrical Eng. & Systems 2022-11-03 Hongyang He , Feng Ziliang , Yuanhang Zheng , Shudong Huang , HaoBing Gao

ConvTransSeg: A Multi-resolution Convolution-Transformer Network for Medical Image Segmentation

Convolutional neural networks (CNNs) achieved the state-of-the-art performance in medical image segmentation due to their ability to extract highly complex feature representations. However, it is argued in recent studies that traditional…

Computer Vision and Pattern Recognition · Computer Science 2025-03-31 Zhendi Gong , Andrew P. French , Guoping Qiu , Xin Chen

CondenseNeXt: An Ultra-Efficient Deep Neural Network for Embedded Systems

Due to the advent of modern embedded systems and mobile devices with constrained resources, there is a great demand for incredibly efficient deep neural networks for machine learning purposes. There is also a growing concern of privacy and…

Computer Vision and Pattern Recognition · Computer Science 2021-12-02 Priyank Kalgaonkar , Mohamed El-Sharkawy

CNN-transformer mixed model for object detection

Object detection, one of the three main tasks of computer vision, has been used in various applications. The main process is to use deep neural networks to extract the features of an image and then use the features to identify the class and…

Computer Vision and Pattern Recognition · Computer Science 2022-12-14 Wenshuo Li

ConViViT -- A Deep Neural Network Combining Convolutions and Factorized Self-Attention for Human Activity Recognition

The Transformer architecture has gained significant popularity in computer vision tasks due to its capacity to generalize and capture long-range dependencies. This characteristic makes it well-suited for generating spatiotemporal tokens…

Computer Vision and Pattern Recognition · Computer Science 2023-10-24 Rachid Reda Dokkar , Faten Chaieb , Hassen Drira , Arezki Aberkane

A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP

Convolutional neural networks (CNN) are the dominant deep neural network (DNN) architecture for computer vision. Recently, Transformer and multi-layer perceptron (MLP)-based models, such as Vision Transformer and MLP-Mixer, started to lead…

Computer Vision and Pattern Recognition · Computer Science 2021-11-29 Yucheng Zhao , Guangting Wang , Chuanxin Tang , Chong Luo , Wenjun Zeng , Zheng-Jun Zha

Latent Model Ensemble with Auto-localization

Deep Convolutional Neural Networks (CNN) have exhibited superior performance in many visual recognition tasks including image classification, object detection, and scene label- ing, due to their large learning capacity and resistance to…

Computer Vision and Pattern Recognition · Computer Science 2016-10-12 Miao Sun , Tony X. Han , Xun Xu , Ming-Chang Liu , Ahmad Khodayari-Rostamabad

Multilevel Context Representation for Improving Object Recognition

In this work, we propose the combined usage of low- and high-level blocks of convolutional neural networks (CNNs) for improving object recognition. While recent research focused on either propagating the context from all layers, e.g.…

Computer Vision and Pattern Recognition · Computer Science 2018-03-28 Andreas Kölsch , Muhammad Zeshan Afzal , Marcus Liwicki