may

Object as Point

1. Motivation:

Traditional detector need post-processing (NMS), hard to differentiate and train (not end-to-end).
One-stage detector: sliding anchors over imgs and classify them
Two-stage detector: recompute imgs features for potential boxs and classify them.

2. Methods:

2.1 Highlight:

CenterNet:
- represent object by a single point of bbox
- regress object size etc.
Inference:
- single network forward-pass without NMS

2.2 Implement

Points of bbox:
- generate heat map with a FCN (keypoint prediction network)
- extract local peaks in the key point
  - background: CornerNet etc. use keypoint estimation to detect corner
Size (via regression):
- image features at each peak predict the objects Bboxs

The network predicts a total of C + 4 outputs (C classes, 2 para for local offsets, 2 para for size) at each location. All output share a common FC backbone network. For each modality, the features of the backbone are then passed through a separate 3 3 convolution, ReLu and another 1 1 convolutions.

output stride R = 4, i.e. the resolution will reduce to be 4 times smaller

2.3 Loss

A weighted sum of 3 loss

$L_k$: penalty-reduced pixel wise logistic regression with focall loss
$L_{off}$: additionally predict a local offset,
- to recover the discretization error cases by the output stride
- each center point has 2 offset
$L_{size}$: regress object size for each object

Object as Point

1. Motivation:

2. Methods:

2.1 Highlight:

2.2 Implement

2.3 Loss

近期文章

近期评论

标签

热门

文章归档

分类目录

功能