fusion network for rgb

1. Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection

Published in 2018 CVPR

paper: Progressively Complementarity-aware Fusion Network for RGB-D Salient Object Detection

  • Main novelty:
    • Design a complementarity-aware fusion (CA-Fuse) module, which introduces cross-modal residual functions and complementarity-aware supervisions (side loss)
    • Add level-wise supervision from deep to shallow densely
  • Overall architecture
  • CA-Fuse module
  • Details: Add a large kernel convolution (Conv6, 13X13) and include five set of side loss functions (weights all set to 1) plus a loss to encourage informative combination of all side outputs.
  • Results

2. Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection

Published in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

paper: Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection.html)

  • Main novelty: Proposed an attention-aware cross-modal cross-level fusion (ACCF) module to fuse different level RGB and depth features. The ACCF module is similar to the SE block.
  • Details: Add a large kernel convolution (Conv6, 13X13) and include five loss functions (weights all set to 1).
  • Results (worse than the CVPR results)

3. Multi-Modal Fusion Network with Multi-Scale Multi-Path and Cross-Modal Interactions for RGB-D Salient Object Detection

Published in 2019 Pattern Recognition

paper: Multi-Modal Fusion Network with Multi-Scale Multi-Path and Cross-Modal Interactions for RGB-D Salient Object Detection

  • Main novelty:
    • Propose a global understanding branch (pooling) and a local capturing branch (realized through dilated convolution).
    • Multi-layer fusion by element-wise summation.
  • Architecture
  • Details
    • Train the R_SalNet first by VGG initialization. Then train the D_SalNet with R_SalNet initialization. Finally, fintune the whole network with paired inputs.
    • FC layer outputs (3136 vector) is warped to the saliency map of 56 x 56.
  • Results
    • Achieve better results when compared to state-of-the-arts methods (seems even worse then the IROS paper).
    • The fusion direction is important (depth to RGB works best). The authors stated that the ‘MP+CI-Bi’ method introduced too much parameters and the bi-directional connections may destroy the fragile architecture. Since the fusion was achieved by summation, no additional parameters should be introduced. The first reason seems unreasonable.