Attention Mechanism

Attention Definition

RNN Encoder-Decoder

input: $x_1, x_2,…, x_{Tx}$
output: $y_1, y_2,…, y_{Ty}$
hidden: $h_t = f(x_{t-1}, y_t)$, $h_1, h_2,…, h_{Tx}$

Bahdanau(ICLR, 2015)


encoder: biGRU forword

decoder: align model

align

output

Minh-Thang Luong(2015)


align

output

global

local

CNN-RNN

text Generation

soft-attention

hard-attention
one hot weight of attention

Transformer


Dot-Product Attention

Multi-Head Attention

LayerNorm

Feed Forward

Position Encoding(Rico.S, 2016)

Universal Transformer


recurrent network

output

Reference

[1] Neural Machine Translation By Jointly Learning to Align And Translate
[2] Effective Approaches to Attention-based Neural Machine Translation
[3] Attention Is All You Need
[4] Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
[5] Universal Transformers
[6] Neural Machine Translation of Rare Words with Subword Units
[7] Convolutional Sequence to Sequence Learning]
[8] Layer Normalization