adversarial multi-task learning for text classification Multi-task Learning for Text Classification Incorporating Adversarial Training

Most existing work on multi-task learning attempts to divide the features of different tasks into private and shared spaces, merely based on whether parameters of some components should be shared.

In this paper they design a generic shared- private learning framework to model the text sequence.

They introduce two strategies: adversarial training and orthogonality constraints. The adversarial training is used to ensure that the shared feature space simply contains common and task-invariant information, while the orthogonality constraint is used to eliminate redundant features from the private and shared spaces.

Contribution

  • Proposed model divides the task-specific and shared space in a more precise way, rather than roughly sharing parameters.
  • We extend the original binary adversarial training to multi-class, which not only enables multiple tasks to be jointly trained, but allows us to utilize unlabeled data.
  • We can condense the shared knowledge among multiple tasks into an off-the-shelf neural layer, which can be easily transferred to new tasks.

Multi-task Learning for Text Classification

The key factor of multi-task learning is the sharing scheme in latent feature space. In neural network based model, the latent features can be regraded as the states of hidden neurons. Specific to text classification, the latent features are the hidden states of LSTM at the end of a sentence.

Fully-shared scheme

In fully-shared model, we use a single shared LSTM layer to extract features for all the tasks. For example, given two tasks m and n, it takes the view that the features of task m can be totally shared by task n and vice versa.

This model ignores the fact that some features are task-dependent.

shared-private scheme

Each task is assigned a private LSTM layer and a shared LSTM layer.

Incorporating Adversarial Training

Although the shared-private model separates the feature space into the shared and private spaces, there is no guarantee that sharable features can not exist in private feature space, or vice versa.

Therefore, a simple principle can be applied into multi-task learning that a good shared feature space should contain more common information and no task-specific information.

Adversarial Network

The Goal is to learn a generative distribution $p_G(x)$ that matches the real data distribution $P_{data}(x)$.