English

Understanding Convolution for Semantic Segmentation

Computer Vision and Pattern Recognition 2018-06-04 v3

Abstract

Recent advances in deep learning, especially deep convolutional neural networks (CNNs), have led to significant improvement over previous semantic segmentation systems. Here we show how to improve pixel-wise semantic segmentation by manipulating convolution-related operations that are of both theoretical and practical value. First, we design dense upsampling convolution (DUC) to generate pixel-level prediction, which is able to capture and decode more detailed information that is generally missing in bilinear upsampling. Second, we propose a hybrid dilated convolution (HDC) framework in the encoding phase. This framework 1) effectively enlarges the receptive fields (RF) of the network to aggregate global information; 2) alleviates what we call the "gridding issue" caused by the standard dilated convolution operation. We evaluate our approaches thoroughly on the Cityscapes dataset, and achieve a state-of-art result of 80.1% mIOU in the test set at the time of submission. We also have achieved state-of-the-art overall on the KITTI road estimation benchmark and the PASCAL VOC2012 segmentation task. Our source code can be found at https://github.com/TuSimple/TuSimple-DUC .

Keywords

Cite

@article{arxiv.1702.08502,
  title  = {Understanding Convolution for Semantic Segmentation},
  author = {Panqu Wang and Pengfei Chen and Ye Yuan and Ding Liu and Zehua Huang and Xiaodi Hou and Garrison Cottrell},
  journal= {arXiv preprint arXiv:1702.08502},
  year   = {2018}
}

Comments

WACV 2018. Updated acknowledgements. Source code: https://github.com/TuSimple/TuSimple-DUC