combination of multiple global descriptors for image retrieval – studio-31

[Paper link] : https://arxiv.org/pdf/1903.10663.pdf

This paper propose Combined global descriptor framework.

[frame work figure]

The pooling methods they used

  • SPoC (sum pooling of convolution) : activates larger regions on the image representation
  • MAC (maximum activation of convolutions) : activates more focused regions
  • GeM (generalized-mean pooling)

The authors insist that different pooling method represents different information in an image. And somehow they prove that with experiments.

Framework details

Thee proposed framework consists of two modules

  1. main module learns an image representation, which is a combination of multiple global descriptors for a ranking loss.
  2. an auxiliary module does fine-tune a CNN with a classification loss.
  • backbone : they used many backbone models such as ResNet50, SENet, ShuffleNet-v2.
  • ranking loss : they used hard-batch triplet loss and etc.

Conclusion

Using SPoc and GeM gives the best performance.

Not understood

In figure 4 in the paper.

How they compute pairwise similarity?

[related part in the paper]

A visualization tool proposed in [47] highlights the regions of images that contribute the most to pairwise similarity. We modify this work to be suitable for our framework in order to see how much each region of the image contributes to the similarity for each final embedding of different configurations. Figure 4 shows visualization of topmost ([email protected]) retrieved image of each configuration on the same query.