note – deep contextualized word representations

Peters et al. - 2018 - Deep contextualized word representations

Introduction

They introduce a new type of deep contextualised word representation complex characteristics of word use. Their word vectors are learned functions of the internal states of a deep bidirectional language model (biLM).

Bidirectional language models

Given $(t_1, t_2, ldots, t_N)$, a forward language model computes the probability:

$p(t_1, t_2, ldots, t_N) = Pi^{N}_{k=1}p(t_k|t_1, t_2, ldots, t_{k-1})$

A backward LM is similar to a forward LM:

$p(t_1, t_2, ldots, t_N) = Pi^{N}_{k=1}p(t_k|t_{k+1}, t_{k+2}, ldots, t_{N})$

The formulation jointly maximises the log likelihood:

$Pi^{N}_{k=1}(log{p(t_k|t_1, t_2, ldots, t_{k-1};Theta_{x}, vec{Theta}_{LSTM}, Theta_{s})} + log{p(t_k|t_{k+1}, t_{k+2}, ldots, t_{N};Theta_{x}, overleftarrow{Theta}_{LSTM}, Theta_{s})})$

The parameters both the token representation $Theta_{x}$ and softmax layer $Theta_{s}$ are shared in the forward and backward direction.

note – deep contextualized word representations

Introduction

Bidirectional language models

近期文章

近期评论

标签

热门

文章归档

分类目录

功能