show_attend_tell

论文 Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

生成标注 $y$ ， $mathbf { y } _ { i }$ 为 $K$ （vocabulary的大小）维向量； $C$ 是标注的长度：

$y = left{ mathbf { y } _ { 1 } , ldots , mathbf { y } _ { C } right} , mathbf { y } _ { i } in mathbb { R } ^ { K }$ $mathbf{a}_{i}$ 为 $D$ 维图像特征
$a = left{ mathbf { a } _ { 1 } , ldots , mathbf { a } _ { L } right} , mathbf { a } _ { i } in mathbb { R } ^ { D }$ $mathbf { i } _ { t } , mathbf { f } _ { t } , mathbf { c } _ { t } , mathbf { o } _ { t } , mathbf { h } _ { t }$ 分别是输入、遗忘、记忆、输出、hidden state；

$hat { mathbf { z } } in mathbb { R } ^ { D }$ 是上下文向量；$mathbf { E } in mathbb { R } ^ { m times K }$ 是embedding矩阵
$left( begin{array} { c } { mathbf { i } _ { t } } \ { mathbf { f } _ { t } } \ { mathbf { o } _ { t } } \ { mathbf { g } _ { t } } end{array} right) = left( begin{array} { c } { sigma } \ { sigma } \ { sigma } \ { tanh } end{array} right) T _ { D + m + n , n } left( begin{array} { c } { mathbf { E y } _ { t - 1 } } \ { mathbf { h } _ { t - 1 } } \ { mathbf { z } _ { t } } end{array} right)$ $mathbf { c } _ { t } = mathbf { f } _ { t } odot mathbf { c } _ { t - 1 } + mathbf { i } _ { t } odot mathbf { g } _ { t }$ $mathbf { h } _ { t } = mathbf { o } _ { t } odot tanh left( mathbf { c } _ { t } right)$ $e _ { t i } = f _ { mathrm { att } } left( mathbf { a } _ { i } , mathbf { h } _ { t - 1 } right)$ $alpha _ { t i } = frac { exp left( e _ { t i } right) } { sum _ { k = 1 } ^ { L } exp left( e _ { t k } right) }$ $hat { mathbf { z } } _ { t } = phi left( left{ mathbf { a } _ { i } right} , left{ alpha _ { i } right} right)$ $mathbf { c } _ { 0 } = f _ { text { init } , c } left( frac { 1 } { L } sum _ { i } ^ { L } mathbf { a } _ { i } right)$ $mathbf { h } _ { 0 } = f _ { mathrm { inith } } left( frac { 1 } { L } sum _ { i } ^ { L } mathbf { a } _ { i } right)$ $p left( mathbf { y } _ { t } | mathbf { a } , mathbf { y } _ { 1 } ^ { t - 1 } right) propto exp left( mathbf { L } _ { o } left( mathbf { E } mathbf { y } _ { t - 1 } + mathbf { L } _ { h } mathbf { h } _ { t } + mathbf { L } _ { z } hat { mathbf { z } } _ { t } right) right)$ $mathbb { E } _ { p left( s _ { t } | a right) } left[ hat { mathbf { z } } _ { t } right] = sum _ { i = 1 } ^ { L } alpha _ { t , i } mathbf { a } _ { i }$ $begin{aligned} N W G M left[ p left( y _ { t } = k | mathbf { a } right) right] & = frac { prod _ { i } exp left( n _ { t , k , i } right) ^ { p left( s _ { t , i } = 1 | a right) } } { sum _ { j } prod _ { i } exp left( n _ { t , j , i } right) ^ { p left( s _ { t , i } = 1 | a right) } } \ & = frac { exp left( mathbb { E } _ { p left( s _ { t } | a right) } left[ n _ { t , k } right] right) } { sum _ { j } exp left( mathbb { E } _ { p left( s _ { t } | a right) } left[ n _ { t , j } right] right) } end{aligned}$ $L _ { d } = - log ( P ( mathbf { y } | mathbf { x } ) ) + lambda sum _ { i } ^ { L } left( 1 - sum _ { t } ^ { C } alpha _ { t i } right) ^ { 2 }$

show_attend_tell

近期文章

近期评论

标签

热门

文章归档

分类目录

功能