[toc]

Motivation: smartly associate the best of the region-based (e.g., Faster R-CNN) and region-free (e.g., SSD) methodologies
RON mainly focuses on two fundamental problems:
- (a) multi-scale object localization: reverse connection
- (b) negative sample mining: objectness prior
  (目标先验知识)
Experiments(VGG16, 384X384 input size):
- VOC 2007: 81.3%
- VOC 2012: 80.7%
- 1.5G GPU memory at test phase
- speed: 15 FPS, 3× faster than the Faster R-CNN counterpart.

简介

RON: associate the merits of region-based and region-free approaches.
Contributions:
1. RON: a fully convolutional framework for end-to-end object detection
2. effective training strategies: negative example mining and data augmentation
3. time and resource efficient: conduct
  extensive design choices

网络结构

网络准备

VGG-16: pre-trained with ImageNet

convert FC6 (14th layer) and FC7 (15th layer) to convolutional layers
use 2×2 convolutional kernels with stride 2 to reduce the resolution of FC7 by half

Reverse Connection

Inspired from: residual connection
reverse connection enables former features to have more semantic information
RON more effective in detecting all scales of objects compared with SSD: the reverse connection is learnable, the semantic information of former layers can be significantly enriched.

Reference Boxes: generate bounding boxes on feature maps

Sk = {(2k − 1) · smin, 2k · smin}, k 2 {1, 2, 3, 4}.

Objectness Prior

add a 3×3×2 convolutional layer followed by a Softmax function to indicate the existence of an object in each box.

there are 10 default boxes at each location(2 scales and 5 aspect ratios)

objectness prior maps could explicitly reflect the existence of an object, it can dramatically reduce the searching space

Detection and Bounding Box Regression

inception module on the feature maps
Object detection
bounding box regression modules

Combining Objectness Prior with Detection

For training:

assign a binary class label to each Reference Boxes
if covers object, assign a class-specific label, For each ground truth box:
- we match it with the candidate region with most jaccard overlap
- match candidate regions to any ground truth with jaccard overlap higher than 0.5
- assign negative labels to the boxes with jaccard overlap lower than 0.3
for detection, samples whose objectness scores are high than threshold op are selected

Training and Testing

Loss Function

For each location, our network has three sibling output branches:

objectness confidence score
boundingbox regression loss
classification loss
use a multi-task loss L to jointly train the networks end-to-end

Joint Training and Testing

dynamic training strategy:

all of the positive samples, negative samples are randomly selected
reduce the sample number: positive and negative samples is 1:3

Data augmentation

Using the original/flipped input image;
Randomly sampling a patch and making sure that at least one object’s center is within this patch, overcome the limitation of small objects by adding a small scale for training

ron论文要点简介相关工作网络结构 Training and Testing

简介

相关工作

网络结构

网络准备

Reverse Connection

Reference Boxes: generate bounding boxes on feature maps

Objectness Prior

Detection and Bounding Box Regression

Combining Objectness Prior with Detection

Training and Testing

Loss Function

Joint Training and Testing

Data augmentation

Inference

近期文章

近期评论

标签

热门

文章归档

分类目录

功能