Attension Is All You Need

时间：2021-08-16 22:39:50 阅读：16 评论：0 收藏：0 [点我收藏+]

attention机制将整个句子作为输入，从中抽取有用的信息。

每个输出都跟整个句子优化，输出的值为输入的句子的词向量的一个加权求和值。

“This is what attention does, it extracts information from the whole sequence, a weighted sum of all the past encoder states”

https://towardsdatascience.com/attention-is-all-you-need-discovering-the-transformer-paper-73e5ff5e0634

https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/

self-attention:

Self-attention is a sequence-to-sequence operation: a sequence of vectors goes in, and a sequence of vectors comes out. Let’s call the input vectors x1, x2,…, xt and the corresponding output vectors y1, y2,…, yt. The vectors all have dimension k. To produce output vector yi, the self attention operation simply takes a weighted average over all the input vectors, the simplest option is the dot product.

Q, K, V:

Every input vector is used in three different ways in the self-attention mechanism: the Query, the Key and the Value. In every role, it is compared to the other vectors to get its own output yi(Query), to get the j-th output yj(Key) and to compute each output vector once the weights have been established (Value).

Attension Is All You Need

原文：https://www.cnblogs.com/simpleminds/p/15149390.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)