
Attention Definition
RNN Encoder-Decoder
input: $x_1, x_2,…, x_{Tx}$
output: $y_1, y_2,…, y_{Ty}$
hidden: $h_t = f(x_{t-1}, y_t)$, $h_1, h_2,…, h_{Tx}$
Bahdanau(ICLR, 2015)

encoder: biGRU forword
decoder: align model
align
output
Minh-Thang Luong(2015)

align
output
global

local

CNN-RNN

text Generation
soft-attention
hard-attention
one hot weight of attention
Transformer

Dot-Product Attention
Multi-Head Attention

LayerNorm
Feed Forward
Position Encoding(Rico.S, 2016)
Universal Transformer

recurrent network
output
Reference
[1] Neural Machine Translation By Jointly Learning to Align And Translate
[2] Effective Approaches to Attention-based Neural Machine Translation
[3] Attention Is All You Need
[4] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
[5] Universal Transformers
[6] Neural Machine Translation of Rare Words with Subword Units
[7] Convolutional Sequence to Sequence Learning]
[8] Layer Normalization




近期评论