MLP-Mixer: An all-MLP Architecture for Vision

时间：2021-06-29 22:50:07 阅读：21 评论：0 收藏：0 [点我收藏+]

概
主要内容
代码

Tolstlkhin I., Houlsby N., Kolesnikov A., Beyer L., Zhai X., Unterthiner T., Yung J., Steiner A., Keysers D., Uszkoreit J., Lucic M., Dosovitskly A. MLP-mixer: an all-mlp architecture for vision. In International Conference on Learning Representations (ICLR), 2021.

概

CNN, Transformer, 现在直接用全连接层就可以了. 真的乱.

主要内容

技术分享图片

如上图所示:

Input: 和ViT一样, 首先将图片切割成一个个patch, 然后通过全连接层将每个patch映射为其对于的embeddings:

\[X \in \mathbb{R}^{B \times T \times D}, \]
其中\(B\)是batch size, \(T\)即为patches的数目, \(D\)便是图中channels的大小.
将其通过Mixer Layer N次, 并经过global average pooling得到特征, 再通过全连接层得到logits.
输出类别.

其中, Mixer Layer的流程如下(考虑一个batch):

对每个patch进行独立的处理, 假设\(x_i\)为第\(i\)个patch(行向量):

\[u_i = x_i + \sigma(\mathrm{LayerNorm}(x_i)W_1) W_2. \]
此时得到\(U \in \mathbb{R}^{B \times T \times D}\), 再假设\(u_j \in \mathbb{R}^T\)为第i个channel:

\[y_j = u_j + \sigma(\mathrm{LayerNorm}(u_j)W_3) W_4. \]
最后得到输出\(Y\).

可以发现, MLP-Mixer 实际上将channel-wise和spatial-wise的操作拆分开来了, 这样即可获得很好的效果.

代码

原文代码

MLP-Mixer: An all-MLP Architecture for Vision

原文：https://www.cnblogs.com/MTandHJ/p/14951323.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)