首页 > 编程语言 > 详细

语言模型

时间：2021-03-09 09:24:07 阅读：20 评论：0 收藏：0 [点我收藏+]

定义
ELMo[2]
语言模型预训练
参考资料

定义

\(p(w_1,...,w_n)=\prod_{i=1,...,n}p(w_i|w1,...,w_{i-1})\)，\(p(w_i|w1,...,w_{i-1})\)通常是一个(循环)神经网络
2018年之前用于文本生成，如机器翻译和语音识别；2018年之后，在大量数据上预训练，在任何特定任务上精调

ELMo[2]

训练两个语言模型，从左到右和从右到左；从网络中抽取上下文化向量
上下文词嵌入：\(f(w_k|w_1,...,w_n)\in R^N\)
- \(f(\text{play}|\text{Elmo and Cookie Monster play a game.})\ne f(\text{play}|\text{The Broadway play premiered yesterday.})\)

语言模型预训练

GPT[3]

训练transformer语言模型(预训练一个常见架构)
在更长的文本上训练，具有自注意力，能很好地进行扩展(如GPT-2和GPT-3)，不是双向的

BERT[4]

训练遮蔽语言模型，联合建模左右上下文
引入新任务：预测遗失/遮蔽词
双向推理对许多任务是重要的

RoBERTa[5]

扩展数据而非模型大小，增加批大小，简化损失

参考资料

[1]AAAI 2021 Tutorial Recent Advances in LM Pretraining
[2] 2018 | Deep contextualized word representations | Matthew E. Peters et al.
[3] 2018 | Improving Language Understanding by Generative Pre-Training | Alec Radford et al.
[4] 2018 | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | Jacob Devlin et al.
[5] 2019 | RoBERTa: A Robustly Optimized BERT Pretraining Approach | Yinhan Liu et al.

原文：https://www.cnblogs.com/yao1996/p/14503002.html

踩

(0)

赞

(0)

举报

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

最新文章

更多>

教程昨日排行

更多>

友情链接

汇智网 PHP教程插件网

关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com

© 2014 bubuko.com 版权所有

打开技术之扣，分享程序人生！