[cvpr18] multi

Task

This paper tackles the problem of simultaneous depth prediction and scene parsing. More specifically, using the same input data to predict the two tasks.

Motivation

1.Problems of traditionally direct multi-task learning:

  • Mainly focus on the final prediction stage with cross-modal interactions or other refinement methods.
  • Distinct objectives for different tasks make it complicated for the model to be well optimized for both two tasks. In this way, the multi-task performance on some tasks may be even worse than optimizing the single task.

2.Benefits of the newly proposed method

This paper proposes to learn intermediate auxiliary tasks as multi-modal inputs.

  • It is well known that multi-modal data improves the performance of deep predictions.
  • The network could also learn to predict other related information as multi-modal inputs, such as contours and surface normals.
  • How to effectively exploit the multi-modal data to benefit the final predictions is crucially important. Current deep multi-task learning models assume only the single-modal data.

3.Graphical illustration
motivation

Framework

framework

Techniques

1.Multi-task learning

In contrast to the traditional multi-task learning, this paper proposes a new means, which predicts intermediate auxiliary tasks and utilize the multi-task predictions to refine the final predictions. This will help the model learn better feature representations.

2.Multi-modal distillation

Three multi-modal distillation modules are proposed to incorporate information from different predictions. It seems the multi-modal distillation involves in multiple source information fusion and how to maximally exploit information from various sources to improve the final result.

Others

1.Network Optimization VS Network Training

Usually the topic of ‘network optimization’ follows after the main method and presents in the last paragraphs of the ‘Method’ section. Its content is usually the illustration of the loss functions to train the whole model. It seems that if the whole model is feed-forward, then you can say ‘end-to-end network optimization.’

The topic of ‘network training’ seems usually appear in the first paragraphs of the ‘Experiment’ section, and may be contained in the ‘Implementation details’ sub-section and is illustrated with the learning rate, training epochs, initialization, and other network parameters involved during training. While the ‘network optimization’ does not talk about the in-network parameters and just the supervision signals. When talking about ‘network training,’ you can say ‘stage-wise’ training when you want to train the first a few parts of the network and then the following so that the model could be better initialized and be better trained.

2.If you introduce a new module, it is important to verify the performance improvement is not caused due to the enlarged model capacity.

One solution for demonstration is that you can choose a different input to the module to compare with your preferred input to this module to verify the improvement is not due to the model capacity.