why and how to use pseudo

  • Pseudo labels in multi-task learning
  • Pseudo data selection with density and distribution distance

Pseudo labels

The reason why need to apply pseudo data into model training

Pseudo data selection

Because much noisy data can effect the model performance and even disturb the model training process, leading to poor model performance, pseudo data selection can control the ratio of pseudo data in the whole datasets.

Density

  • reason: because clusters are easily detected by the local density of data points, in the pseudo data belong to same categoty have similar feature. Appling a density based clustering algorithm that measures the complexity of psuedo data using data distribution density in. each category.
  • implementation detail: measuring the purity of data with pseudo label based on its distribution density in a feature space, and rank the purity to generate pseudo weights, pseudo data with higher density is assigned larger weights, while smaller weights are assigned to low-density pseudo data.

Distribution distance