ron论文要点 简介 相关工作 网络结构 Training and Testing

[toc]

  1. Motivation: smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies
  2. RON mainly focuses on two fundamental problems:
    • (a) multi-scale object localization: reverse connection
    • (b) negative sample mining: objectness prior
      (目标先验知识)
  3. Experiments(VGG16, 384X384 input size):
    • VOC 2007: 81.3%
    • VOC 2012: 80.7%
    • 1.5G GPU memory at test phase
    • speed: 15 FPS, 3× faster than the Faster R-CNN counterpart.

简介

  1. RON: associate the merits of region-based and region-free approaches.
  2. Contributions:
    1. RON: a fully convolutional framework for end-to-end object detection
    2. effective training strategies: negative example mining and data augmentation
    3. time and resource efficient: conduct
      extensive design choices

相关工作

DPM, R-CNN, SPP-Net, Fast R-CNN, R-FCN, YOLO, SSD

网络结构

网络准备

VGG-16: pre-trained with ImageNet

  • convert FC6 (14th layer) and FC7 (15th layer) to convolutional layers
  • use 2×2 convolutional kernels with stride 2 to reduce the resolution of FC7 by half

Reverse Connection

  1. Inspired from: residual connection
  2. reverse connection enables former features to have more semantic information
  3. RON more effective in detecting all scales of objects compared with SSD: the reverse connection is learnable, the semantic information of former layers can be significantly enriched.

Reference Boxes: generate bounding boxes on feature maps

Sk = {(2k − 1) · smin, 2k · smin}, k 2 {1, 2, 3, 4}.

Objectness Prior

add a 3×3×2 convolutional layer followed by a Softmax function to indicate the existence of an object in each box.

there are 10 default boxes at each location(2 scales and 5 aspect ratios)

objectness prior maps could explicitly reflect the existence of an object, it can dramatically reduce the searching space

Detection and Bounding Box Regression

  • inception module on the feature maps
  • Object detection
  • bounding box regression modules

Combining Objectness Prior with Detection

For training:

  1. assign a binary class label to each Reference Boxes
  2. if covers object, assign a class-specific label, For each ground truth box:
    • we match it with the candidate region with most jaccard overlap
    • match candidate regions to any ground truth with jaccard overlap higher than 0.5
    • assign negative labels to the boxes with jaccard overlap lower than 0.3
  3. for detection, samples whose objectness scores are high than threshold op are selected

Training and Testing

Loss Function

For each location, our network has three sibling output branches:

  1. objectness confidence score
  2. boundingbox regression loss
  3. classification loss
    use a multi-task loss L to jointly train the networks end-to-end

Joint Training and Testing

dynamic training strategy:

  1. all of the positive samples, negative samples are randomly selected
  2. reduce the sample number: positive and negative samples is 1:3

Data augmentation

  1. Using the original/flipped input image;
  2. Randomly sampling a patch and making sure that at least one object’s center is within this patch, overcome the limitation of small objects by adding a small scale for training

Inference