FAST AND ACCURATE READING COMPREHENSION BY COMBINING SELF-ATTENTION AND CONVOLUTION
ICLR2018
CMU,Google Brain
任务描述
【任务类型】: 阅读理解
【数据集】: SQUAD,triviQA
创新点
- 没用RNN(所以速度快)
- 结合了transformer里面的multi-head attention
- CNN combined with self-attention
- data augmentation by backtranslation
模型架构
很多地方挺像BiDAF模型
模型性能
trick
data augmented
论文摘要
When paraphrasing, we keep the question q unchanged (to avoid accidentally changing its meaning) and generate new triples of (d0, q, a0) such that the new document d0 has the new answer a0 in it. The procedure happens in two steps: (i) document paraphrasing – paraphrase d into d0 and (b) answer extraction – extract a0 from d0 that closely matches a.
Compared to SQuAD, TriviaQA is more chal- lenging in that: 1) its examples have much longer context (2895 tokens per context on average) and may contain several paragraphs, 2) it is much noisier than SQuAD due to the lack of human labeling, 3) it is possible that the context is not related to the answer at all, as it is crawled by key words.
Due to the multi-paragraph nature of the context, researchers also find that simple hierarchical or multi-step reading tricks, such as first predicting which paragraph to read and then apply models like BiDAF to pinpoint the answer within that paragraph
Recently, attempts have been made to replace the recurrent networks by full convolution or full attention architectures (Kim, 2014; Gehring et al., 2017; Vaswani et al., 2017b; Shen et al., 2017a). Those models have been shown to be not only faster than the RNN architectures, but also effective in other tasks, such as text classification, machine translation or sentiment analysis.
参考资料
- Simple and effective multi-paragraph reading comprehension
- Teaching machines to read and comprehend
- Wikireading: A novel large-scale language understanding task over wikipedia
- The goldilocks principle: Reading children’s books with explicit memory representation
- Learning recurrent span representations for extractive question answering
- Structural embedding of syntactic trees for machine comprehension
- MEMEN: multi-layer embedding with memory networks for machine comprehension
近期评论