semantic segmentation overview (part2)

Rethinking Atrous Convolution for Semantic Image Segmentation (DeepLabv3)

Proposed Model

Applied atrous convolution as the feature extraction method. And employ in cascade or parallel. One advantage of atrous convolution is that it can keep the dimension of each feature map and allow us to extract denser feature responses.

Note that output_stride is the ratio of input image spatial resolution to final output resolution.

Going Deeper with Atrous Convolution

Cascade version of atrous convolution with ResNet architecture. The motivation of this model is that the introduced striding makes it easy to capture long range information in the deeper blocks.

Also, the author introduced Multi-grid method, which employ a hierarchy of grids of different sizes.

Atrous Spatial Pyramid Pooling

Atrous Spatial Pyrimid Pooling is applied on top of the feature map, with four parallel atrous convolutions with different atrous rates.

Final ASPP: One 1x1 convolution and three 3x3 convolutions with rates(6, 12, 18) when output_stride is 16. Then concat all branches results and pass through another 1x1 convolution before the final 1x1 convolution which generates the final logits.