[toc]
- Motivation: smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies
- RON mainly focuses on two fundamental problems:
- (a) multi-scale object localization: reverse connection
- (b) negative sample mining: objectness prior
(目标先验知识)
- Experiments(VGG16, 384X384 input size):
- VOC 2007: 81.3%
- VOC 2012: 80.7%
- 1.5G GPU memory at test phase
- speed: 15 FPS, 3× faster than the Faster R-CNN counterpart.
简介
- RON: associate the merits of region-based and region-free approaches.
- Contributions:
- RON: a fully convolutional framework for end-to-end object detection
- effective training strategies: negative example mining and data augmentation
- time and resource efficient: conduct
extensive design choices
相关工作
DPM, R-CNN, SPP-Net, Fast R-CNN, R-FCN, YOLO, SSD
网络结构
网络准备
VGG-16: pre-trained with ImageNet
- convert FC6 (14th layer) and FC7 (15th layer) to convolutional layers
- use 2×2 convolutional kernels with stride 2 to reduce the resolution of FC7 by half
Reverse Connection
- Inspired from: residual connection
- reverse connection enables former features to have more semantic information
- RON more effective in detecting all scales of objects compared with SSD: the reverse connection is learnable, the semantic information of former layers can be significantly enriched.
Reference Boxes: generate bounding boxes on feature maps
Sk = {(2k − 1) · smin, 2k · smin}, k 2 {1, 2, 3, 4}.
Objectness Prior
add a 3×3×2 convolutional layer followed by a Softmax function to indicate the existence of an object in each box.
there are 10 default boxes at each location(2 scales and 5 aspect ratios)
objectness prior maps could explicitly reflect the existence of an object, it can dramatically reduce the searching space
Detection and Bounding Box Regression
- inception module on the feature maps
- Object detection
- bounding box regression modules
Combining Objectness Prior with Detection
For training:
- assign a binary class label to each Reference Boxes
- if covers object, assign a class-specific label, For each ground truth box:
- we match it with the candidate region with most jaccard overlap
- match candidate regions to any ground truth with jaccard overlap higher than 0.5
- assign negative labels to the boxes with jaccard overlap lower than 0.3
- for detection, samples whose objectness scores are high than threshold op are selected
Training and Testing
Loss Function
For each location, our network has three sibling output branches:
- objectness confidence score
- boundingbox regression loss
- classification loss
use a multi-task loss L to jointly train the networks end-to-end
Joint Training and Testing
dynamic training strategy:
- all of the positive samples, negative samples are randomly selected
- reduce the sample number: positive and negative samples is 1:3
Data augmentation
- Using the original/flipped input image;
- Randomly sampling a patch and making sure that at least one object’s center is within this patch, overcome the limitation of small objects by adding a small scale for training
近期评论