Object as Point
1. Motivation:
- Traditional detector need post-processing (NMS), hard to differentiate and train (not end-to-end).
- One-stage detector: sliding anchors over imgs and classify them
- Two-stage detector: recompute imgs features for potential boxs and classify them.
2. Methods:
2.1 Highlight:
- CenterNet:
- represent object by a single point of bbox
- regress object size etc.
- Inference:
- single network forward-pass without NMS
2.2 Implement
-
Points of bbox:
-
generate heat map with a FCN (keypoint prediction network)
-
extract local peaks in the key point
-
background: CornerNet etc. use keypoint estimation to detect corner
-
-
-
Size (via regression):
- image features at each peak predict the objects Bboxs
The network predicts a total of C + 4 outputs (C classes, 2 para for local offsets, 2 para for size) at each location. All output share a common FC backbone network. For each modality, the features of the backbone are then passed through a separate 3 3 convolution, ReLu and another 1 1 convolutions.
output stride R = 4, i.e. the resolution will reduce to be 4 times smaller
2.3 Loss
A weighted sum of 3 loss
- $L_k$: penalty-reduced pixel wise logistic regression with focall loss
- $L_{off}$: additionally predict a local offset,
- to recover the discretization error cases by the output stride
- each center point has 2 offset
- $L_{size}$: regress object size for each object
近期评论