FAST AND ACCURATE READING COMPREHENSION BY COMBINING SELF-ATTENTION AND CONVOLUTION

ICLR2018

CMU,Google Brain

任务描述

【任务类型】: 阅读理解
【数据集】: SQUAD,triviQA

创新点

没用RNN（所以速度快）
结合了transformer里面的multi-head attention
CNN combined with self-attention
data augmentation by backtranslation

模型架构

很多地方挺像BiDAF模型

模型性能

trick

data augmented

论文摘要

When paraphrasing, we keep the question q unchanged (to avoid accidentally changing its meaning) and generate new triples of (d0, q, a0) such that the new document d0 has the new answer a0 in it. The procedure happens in two steps: (i) document paraphrasing – paraphrase d into d0 and (b) answer extraction – extract a0 from d0 that closely matches a.

Compared to SQuAD, TriviaQA is more chal- lenging in that: 1) its examples have much longer context (2895 tokens per context on average) and may contain several paragraphs, 2) it is much noisier than SQuAD due to the lack of human labeling, 3) it is possible that the context is not related to the answer at all, as it is crawled by key words.

Due to the multi-paragraph nature of the context, researchers also find that simple hierarchical or multi-step reading tricks, such as first predicting which paragraph to read and then apply models like BiDAF to pinpoint the answer within that paragraph

Recently, attempts have been made to replace the recurrent networks by full convolution or full attention architectures (Kim, 2014; Gehring et al., 2017; Vaswani et al., 2017b; Shen et al., 2017a). Those models have been shown to be not only faster than the RNN architectures, but also effective in other tasks, such as text classification, machine translation or sentiment analysis.

参考资料

Simple and effective multi-paragraph reading comprehension
Teaching machines to read and comprehend
Wikireading: A novel large-scale language understanding task over wikipedia
The goldilocks principle: Reading children’s books with explicit memory representation
Learning recurrent span representations for extractive question answering
Structural embedding of syntactic trees for machine comprehension
MEMEN: multi-layer embedding with memory networks for machine comprehension

FAST AND ACCURATE READING COMPREHENSION BY COMBINING SELF-ATTENTION AND CONVOLUTION

FAST AND ACCURATE READING COMPREHENSION BY COMBINING SELF-ATTENTION AND CONVOLUTION

ICLR2018

CMU,Google Brain

任务描述

创新点

模型架构

模型性能

trick

论文摘要

参考资料

近期文章

近期评论

标签

热门

文章归档

分类目录

功能