机器翻译 - 日期翻译

时间：2019-05-31 10:15:04 阅读：118 评论：0 收藏：0 [点我收藏+]

hljs.initHighlightingOnLoad();
hljs.initLineNumbersOnLoad();

Neural Machine Translation

您将构建神经机器翻译（NMT）模型，将人类可读日期 ("25th of June, 2009")转换为机器可读日期("2009-06-25")。您将使用注意模型执行此操作，注意模型是最复杂的sequence to sequence模型之一。

让我们加载此作业所需的所有包。

from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import numpy as np

from faker import Faker
import random
from tqdm import tqdm
from babel.dates import format_date
from nmt_utils import *
import matplotlib.pyplot as plt
%matplotlib inline

1 - 将人类可读日期翻译成机器可读日期

您将在此处构建的模型可用于将一种语言翻译为另一种语言，例如从英语翻译为印地语。但是，语言翻译需要大量数据集，并且通常需要数天的GPU训练。为了让您在不使用大量数据集的情况下尝试使用这些模型，我们将使用更简单的“日期转换”任务。

神经网络将输入以各种可能格式(e.g. "the 29th of August 1958", "03/30/1968", "24 JUNE 1987") ，将其翻译成标准化的机器可读日期（(e.g. "1958-08-29", "1968-03-30", "1987-06-24")。我们将让网络学习到机器可读的日期格式YYYY-MM-DD。

Take a look at nmt_utils.py to see all the formatting. Count and figure out how the formats work, you will need this knowledge later.

1.1 - 数据集

我们将利用10000个人类可读日期及其等效、标准化机器可读日期来训练一个模型。让我们运行以下单元格来加载数据集并打印一些示例。

m = 10000
dataset, human_vocab, machine_vocab, inv_machine_vocab = load_dataset(m)

100%|██████████████████████████████████████████| 10000/10000 [00:00<00:00, 30957.97it/s]

dataset[:10]

[('15 october 1989', '1989-10-15'),
 ('sunday july 15 1984', '1984-07-15'),
 ('wednesday april 5 1978', '1978-04-05'),
 ('3/26/16', '2016-03-26'),
 ('sunday april 5 1992', '1992-04-05'),
 ('sunday october 16 2005', '2005-10-16'),
 ('11 dec 1992', '1992-12-11'),
 ('21 03 75', '1975-03-21'),
 ('tuesday july 23 2013', '2013-07-23'),
 ('sunday november 3 1996', '1996-11-03')]

你加载了：

dataset：元组列表 a list of tuples（人类可读日期，机器可读日期）
human_vocab：一个python字典，将人类可读日期中使用的所有字符映射到整数值索引
machine_vocab：一个python字典，将机器可读日期中使用的所有字符映射到整数值索引。这些索引不一定与human_vocab一致。
inv_machine_vocab：machine_vocab的逆字典，从索引到字符的映射。

让我们预处理数据并将原始文本数据映射到索引值。我们还将使用Tx = 30（我们假设是人类可读日期的最大长度;如果我们得到更长的输入，我们将截断它）和Ty = 10（因为“YYYY-MM-DD”是长度为10的字符）。

Tx = 30
Ty = 10
X, Y, Xoh, Yoh = preprocess_data(dataset, human_vocab, machine_vocab, Tx, Ty)

print("X.shape:", X.shape)
print("Y.shape:", Y.shape)
print("Xoh.shape:", Xoh.shape)
print("Yoh.shape:", Yoh.shape)

X.shape: (10000, 30)
Y.shape: (10000, 10)
Xoh.shape: (10000, 30, 37)
Yoh.shape: (10000, 10, 11)

你现在有：

X：训练集中人类可读日期的处理版本，其中每个字符由通过human_vocab映射到该字符的索引替换。使用特殊字符（）将每个日期进一步填充到$ T_x $（30）值。 X.shape =（m，Tx）
Y：训练集中机器可读日期的处理版本，其中每个字符由它在machine_vocab中映射到的索引替换。你应该有‘Y.shape =（m，Ty）`。
Xoh：X的one-hot向量，每个样本转换为长度是len（machine_vocab），每个字符在human_vocab对应的位置表示为1，其他的位置是0 Xoh.shape =（m，Tx，len（human_vocab）） m个样本，Tx个字符，每个字符对应的one-hot长度是len（human_vocab）。（one-hot version of X, the "1" entry‘s index is mapped to the character thanks to human_vocab.）
Yoh：Y的one-hot向量， Yoh.shape = (m, Tx, len(machine_vocab)). 这里，len（machine_vocab）= 11，因为有11个字符（‘ - ‘以及0-9）。

让我们看一下预处理训练集。随意在下面的单元格中使用index来导航数据集并查看源/目标日期是如何预处理的。

index = 0
print("Source date:", dataset[index][0])
print("Target date:", dataset[index][1])
print()
print("Source after preprocessing (indices):", X[index])
print("Target after preprocessing (indices):", Y[index])
print()
print("Source after preprocessing (one-hot):", Xoh[index])
print("Target after preprocessing (one-hot):", Yoh[index])

Source date: 15 october 1989
Target date: 1989-10-15

Source after preprocessing (indices): [ 4  8  0 26 15 30 26 14 17 28  0  4 12 11 12 36 36 36 36 36 36 36 36 36
 36 36 36 36 36 36]
Target after preprocessing (indices): [ 2 10  9 10  0  2  1  0  2  6]

Source after preprocessing (one-hot): [[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 0. 0. 1.]]
Target after preprocessing (one-hot): [[0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]]

2 - 使用注意力机制的神经网络翻译

如果你必须将一本书的段落从法语翻译成英语，你就不会阅读整段，然后关闭书籍并翻译。即使在翻译过程中，您也会阅读/重新阅读，并专注于正在翻译的英语部分相对应的法文段落部分。

注意机制告诉神经机器翻译模型，它应该关注每一步的什么地方。The attention mechanism tells a Neural Machine Translation model where it should pay attention to at any step.

2.1 - Attention 机制

在这一部分中，您将实现讲座视频中到的注意力机制。这个图提醒你模型的工作原理。左侧的图表显示了注意力模型。右边的图表显示了一个“注意”步骤"Attention" step如何来计算注意力权重 attention variables $\alpha^{\langle t, t' \rangle}$，它用于计算每个时间步($t=1, \ldots, T_y$)输出的上下文变量 $context ^{\langle t \rangle}$。(which are used to compute the context variable $context^{\langle t \rangle}$ for each timestep in the output ($t=1, \ldots, T_y$). )

注：$t'$是双向SLTM的时间步，$t$ 是上面的LSTM的时间步，例如$\alpha^{\langle 1, 2 \rangle}$ 表示的是上面SLTM第1个时间步于下面的双向SLTM第二个时间步之间的权重。
对于上面SLTM每个时间步，例如第一步，注意力权重满足：注意力权重的和为1
\[\sum_{t} \alpha^{<1, t^{\prime} \rangle}=1\]
注意力权重表示在第t时间步（上层的LSTM）花在$a^{\langle t' \rangle}$（下层的双向LSTM）的注意力程度，也就是说在生成第t个输出词时应该花费多少注意力在第t‘个输入词上面。

第一步的上下文c等于: 双向SLTM的时间步的 $a^{\langle t' \rangle}$ 乘对应的注意力权重的累加和，也就是每一步都考虑了这些状态，但是有不同的权重

**Figure 1**: Neural machine translation with attention

以下是模型的一些属性：

此模型中有两个单独的LSTM（参见左侧图表）。图片底部的那个是双向LSTM并且在attention机制之前，我们将其称为pre-attention Bi-LSTM。图表顶部的LSTM在attention机制之后，因此我们称之为post-attention LSTM。 pre-attention Bi-LSTM 经历了$T_x$时间步; post-attention LSTM经历了$ T_y $时间步。
post-attention LSTM将 $s^{\langle t \rangle}, c^{\langle t \rangle}$从一个时间步传递到下一个时间步。在讲座视频中，我们仅使用基本RNN作为post-activation sequence模型，因此RNN输出状态激活$s^{\langle t\rangle}$。但由于我们在这里使用LSTM，LSTM既有输出激活$s^{\langle t\rangle}$又有隐藏的单元状态 hidden cell state $c^{\langle t\rangle}$。但是，与之前的文本生成示例（例如第1周的Dinosaurus）不同，在此模型中，$ t $时的post-activation LSTM不会将特定生成的$y^{\langle t-1 \rangle}$作为输入; 它只需要 $s^{\langle t\rangle}$和$c^{\langle t\rangle}$ （没有输入x）作为输入。我们以这种方式设计了模型，（与相邻字符高度相关的语言生成不同），因为YYYY-MM-DD日期中前一个字符与下一个字符之间的依赖性不强。
我们使用 $a^{\langle t \rangle} = [\overrightarrow{a}^{\langle t \rangle}; \overleftarrow{a}^{\langle t \rangle}]$ 表示关注pre-attention Bi-LSTM的前向和后向激活的串联（concatenation）。
右边的图表使用RepeatVector节点来复制$s^{\langle t-1 \rangle}$的值$ T_x $次，然后`Concatenation`连接$s^{\langle t-1 \rangle}$和$a^{\langle t \rangle}$来计算$e^{\langle t, t‘\rangle}$，然后传递到softmax来计算$\alpha^{\langle t, t‘ \rangle}$。我们将在下面解释如何在Keras中使用RepeatVector和Concatenation。

让我们实现这个模型。您将从实现两个函数开始:one_step_attention（）和model（）。

1) one_step_attention(): 在 $t$时间步, 根据Bi-LSTM的隐藏状态 ($[a^{<1>},a^{<2>}, ..., a^{<T_x>}]$) 和第二个LSTM的previous隐藏状态（previous hidden state of the second LSTM） ($s^{<t-1>}$), one_step_attention() 将计算出注意力权值($[\alpha^{<t,1>},\alpha^{<t,2>}, ..., \alpha^{<t,T_x>}]$) 并输出上下文向量（context vector）(see Figure 1 (right) for details):
\[context^{<t>} = \sum_{t' = 0}^{T_x} \alpha^{<t,t'>}a^{<t'>}\tag{1}\]

请注意，我们在此将注意力表示为$context^{\langle t \rangle}$。在讲座视频中，上下文表示为$c^{\langle t \rangle}$，但在这里我们称之为$context^{\langle t \rangle}$ 以避免与 post-attention LSTM的内部记忆单元混淆。

2) model(): 实现整个模型。它首先将输入放到Bi-LSTM运行并得到$[a^{<1>},a^{<2>}, ..., a^{<T_x>}]$。然后，它调用one_step_attention（）$T_y$次（用for循环）。在此循环的每次迭代中，它将计算出的上下文向量 $c^{<t>}$ （$context^{\langle t \rangle}$）提供给第二个LSTM，并通过具有softmax激活的密集层生成预测结果$\hat{y}^{<t>}$。

练习：实现one_step_attention（）。函数model（）将使用for循环调用one_step_attention（） $ T_y $次，注意所有$T_y$ copies 具有相同的权重。也就是说，它不应该每次重新初始化权重。换句话说，所有$T_y $步骤都应该具有共享一样的权重。以下是如何在Keras中实现具有可共享权重的图层：
1.定义图层对象（作为示例的全局变量）。
2.传播输入时调用这些对象。

我们已经将您需要的层定义为全局变量。请运行以下单元格来创建它们。请查看Keras文档以确保您了解这些图层是什么：
RepeatVector(), Concatenate(), Dense(), Activation(), Dot().

# 将共享层定义为全局变量 Defined shared layers as global variables
repeator = RepeatVector(Tx)
concatenator = Concatenate(axis=-1)
densor1 = Dense(10, activation = "tanh")
densor2 = Dense(1, activation = "relu")
activator = Activation(softmax, name='attention_weights') # We are using a custom softmax(axis = 1) loaded in this notebook
dotor = Dot(axes = 1)

现在您可以使用这些层来实现one_step_attention（）。为了将keras中的X张量传递到这些层，则使用layer（X）（如果它需要多个输入使用 layer（[X，Y]））。 densor（X）将通过上面定义的Dense（1）层传播X.

Now you can use these layers to implement one_step_attention(). In order to propagate a Keras tensor object X through one of these layers, use layer(X) (or layer([X,Y]) if it requires multiple inputs.), e.g. densor(X) will propagate X through the Dense(1) layer defined above.

# GRADED FUNCTION: one_step_attention

def one_step_attention(a, s_prev):
    """
    计算过程是上面的右图
    Performs one step of attention: Outputs a context vector computed as a dot product of the attention weights
    "alphas" and the hidden states "a" of the Bi-LSTM.
    
    Arguments:
    a -- hidden state output of the Bi-LSTM, numpy-array of shape (m, Tx, 2*n_a)
    s_prev -- previous hidden state of the (post-attention) LSTM, numpy-array of shape (m, n_s)
    
    Returns:
    context -- context vector, input of the next (post-attetion) LSTM cell
    """
    
    ### START CODE HERE ###
    # Use repeator to repeat s_prev to be of shape (m, Tx, n_s) so that you can concatenate it with all hidden states "a" (≈ 1 line)
    s_prev = repeator(s_prev)
    # Use concatenator to concatenate a and s_prev on the last axis (≈ 1 line)
    concat = concatenator([a, s_prev]) #连接成 (a[1],s_prev) (a[2], s_prev)
    # Use densor1 to propagate concat through a small fully-connected neural network to compute the "intermediate energies" variable e. (≈1 lines)
    e = densor1(concat) #第一个全连接层
    # Use densor2 to propagate e through a small fully-connected neural network to compute the "energies" variable energies. (≈1 lines)
    energies = densor2(e) #第二个全连接层
    # Use "activator" on "energies" to compute the attention weights "alphas" (≈ 1 line)
    alphas = activator(energies) #softmax激活
    # Use dotor together with "alphas" and "a" to compute the context vector to be given to the next (post-attention) LSTM-cell (≈ 1 line)
    context = dotor([alphas, a])
    ### END CODE HERE ###
    
    return context

在编写了model（）函数之后，检查one_step_attention（）的预期输出。

练习：实现model（），如图2和上面的文字所述。同样，我们已经定义了要在model（）中共享权重的全局图层。

n_a = 32
n_s = 64
post_activation_LSTM_cell = LSTM(n_s, return_state = True)
output_layer = Dense(len(machine_vocab), activation=softmax)

现在，您可以在for循环中使用这些layers $??_??$次来生成输出，并且不能重新初始化它们的参数。您必须执行以下步骤：

1.将输入传播到Bidirectional LSTM
2.迭代$t = 0, \dots, T_y-1$:
???? 1.使用$[\alpha^{<t,1>},\alpha^{<t,2>}, ..., \alpha^{<t,T_x>}]$和$s^{<t-1>}$调用one_step_attention（）函数，来获取上下文向量$context^{<t>}$。
???? 2.将$context^{<t>}$ 传递到post-attention LSTM单元。请记住使用 initial_state= [previous hidden state, previous cell state]来传入previous hidden-state $s^{\langle t-1\rangle}$ 和 cell-states $c^{\langle t-1\rangle}$，从而获取新的 hidden state $s^{<t>}$ 和新的 cell state $c^{<t>}$.
3.将softmax图层应用于$s^{<t>}$，获取输出。
4.通过将输出添加到输出列表来保存输出。

3.创建您的Keras模型实例，它应该有三个输入("inputs", $s^{<0>}$ and $c^{<0>}$)，最后输出“输出”列表。

# GRADED FUNCTION: model

def model(Tx, Ty, n_a, n_s, human_vocab_size, machine_vocab_size):
    """
    Arguments:
    Tx -- length of the input sequence
    Ty -- length of the output sequence
    n_a -- hidden state size of the Bi-LSTM
    n_s -- hidden state size of the post-attention LSTM
    human_vocab_size -- size of the python dictionary "human_vocab"
    machine_vocab_size -- size of the python dictionary "machine_vocab"

    Returns:
    model -- Keras model instance
    """
    
    # Define the inputs of your model with a shape (Tx,)
    # Define s0 and c0, initial hidden state for the decoder LSTM of shape (n_s,)
    X = Input(shape=(Tx, human_vocab_size))
    s0 = Input(shape=(n_s,), name='s0')
    c0 = Input(shape=(n_s,), name='c0')
    s = s0
    c = c0
    
    # Initialize empty list of outputs
    outputs = []
    
    ### START CODE HERE ###
    
    # Step 1: Define your pre-attention Bi-LSTM. Remember to use return_sequences=True. (≈ 1 line)
    a = Bidirectional(LSTM(n_a, return_sequences=True), name='bidirectional_1')(X)
    
    # Step 2: Iterate for Ty steps
    for t in range(Ty):
    
        # Step 2.A: Perform one step of the attention mechanism to get back the context vector at step t (≈ 1 line)
        context = one_step_attention(a, s)
        
        # Step 2.B: Apply the post-attention LSTM cell to the "context" vector.
        # Don't forget to pass: initial_state = [hidden state, cell state] (≈ 1 line)
        s, _, c = post_activation_LSTM_cell(context, initial_state=[s, c])
        
        # Step 2.C: Apply Dense layer to the hidden state output of the post-attention LSTM (≈ 1 line)
        out = output_layer(s)
        
        # Step 2.D: Append "out" to the "outputs" list (≈ 1 line)
        outputs.append(out)
    
    # Step 3: Create model instance taking three inputs and returning the list of outputs. (≈ 1 line)
    model = Model(inputs=[X, s0, c0], outputs=outputs)
    
    ### END CODE HERE ###
    
    return model

运行以下单元格以创建模型。

model = model(Tx, Ty, n_a, n_s, len(human_vocab), len(machine_vocab))

Let‘s get a summary of the model to check if it matches the expected output.

model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 30, 37)       0                                            
__________________________________________________________________________________________________
s0 (InputLayer)                 (None, 64)           0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 30, 64)       17920       input_1[0][0]                    
__________________________________________________________________________________________________
repeat_vector_1 (RepeatVector)  (None, 30, 64)       0           s0[0][0]                         
                                                                 lstm_2[0][0]                     
                                                                 lstm_2[1][0]                     
                                                                 lstm_2[2][0]                     
                                                                 lstm_2[3][0]                     
                                                                 lstm_2[4][0]                     
                                                                 lstm_2[5][0]                     
                                                                 lstm_2[6][0]                     
                                                                 lstm_2[7][0]                     
                                                                 lstm_2[8][0]                     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 30, 128)      0           bidirectional_1[0][0]            
                                                                 repeat_vector_1[0][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[1][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[2][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[3][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[4][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[5][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[6][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[7][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[8][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[9][0]            
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 30, 10)       1290        concatenate_1[0][0]              
                                                                 concatenate_1[1][0]              
                                                                 concatenate_1[2][0]              
                                                                 concatenate_1[3][0]              
                                                                 concatenate_1[4][0]              
                                                                 concatenate_1[5][0]              
                                                                 concatenate_1[6][0]              
                                                                 concatenate_1[7][0]              
                                                                 concatenate_1[8][0]              
                                                                 concatenate_1[9][0]              
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 30, 1)        11          dense_1[0][0]                    
                                                                 dense_1[1][0]                    
                                                                 dense_1[2][0]                    
                                                                 dense_1[3][0]                    
                                                                 dense_1[4][0]                    
                                                                 dense_1[5][0]                    
                                                                 dense_1[6][0]                    
                                                                 dense_1[7][0]                    
                                                                 dense_1[8][0]                    
                                                                 dense_1[9][0]                    
__________________________________________________________________________________________________
attention_weights (Activation)  (None, 30, 1)        0           dense_2[0][0]                    
                                                                 dense_2[1][0]                    
                                                                 dense_2[2][0]                    
                                                                 dense_2[3][0]                    
                                                                 dense_2[4][0]                    
                                                                 dense_2[5][0]                    
                                                                 dense_2[6][0]                    
                                                                 dense_2[7][0]                    
                                                                 dense_2[8][0]                    
                                                                 dense_2[9][0]                    
__________________________________________________________________________________________________
dot_1 (Dot)                     (None, 1, 64)        0           attention_weights[0][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[1][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[2][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[3][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[4][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[5][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[6][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[7][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[8][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[9][0]          
                                                                 bidirectional_1[0][0]            
__________________________________________________________________________________________________
c0 (InputLayer)                 (None, 64)           0                                            
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [(None, 64), (None,  33024       dot_1[0][0]                      
                                                                 s0[0][0]                         
                                                                 c0[0][0]                         
                                                                 dot_1[1][0]                      
                                                                 lstm_2[0][0]                     
                                                                 lstm_2[0][2]                     
                                                                 dot_1[2][0]                      
                                                                 lstm_2[1][0]                     
                                                                 lstm_2[1][2]                     
                                                                 dot_1[3][0]                      
                                                                 lstm_2[2][0]                     
                                                                 lstm_2[2][2]                     
                                                                 dot_1[4][0]                      
                                                                 lstm_2[3][0]                     
                                                                 lstm_2[3][2]                     
                                                                 dot_1[5][0]                      
                                                                 lstm_2[4][0]                     
                                                                 lstm_2[4][2]                     
                                                                 dot_1[6][0]                      
                                                                 lstm_2[5][0]                     
                                                                 lstm_2[5][2]                     
                                                                 dot_1[7][0]                      
                                                                 lstm_2[6][0]                     
                                                                 lstm_2[6][2]                     
                                                                 dot_1[8][0]                      
                                                                 lstm_2[7][0]                     
                                                                 lstm_2[7][2]                     
                                                                 dot_1[9][0]                      
                                                                 lstm_2[8][0]                     
                                                                 lstm_2[8][2]                     
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 11)           715         lstm_2[0][0]                     
                                                                 lstm_2[1][0]                     
                                                                 lstm_2[2][0]                     
                                                                 lstm_2[3][0]                     
                                                                 lstm_2[4][0]                     
                                                                 lstm_2[5][0]                     
                                                                 lstm_2[6][0]                     
                                                                 lstm_2[7][0]                     
                                                                 lstm_2[8][0]                     
                                                                 lstm_2[9][0]                     
==================================================================================================
Total params: 52,960
Trainable params: 52,960
Non-trainable params: 0
__________________________________________________________________________________________________

Expected Output:

Here is the summary you should see

Total params:	185,484
Trainable params:	185,484
Non-trainable params:	0
bidirectional_1‘s output shape	(None, 30, 128)
repeat_vector_1‘s output shape	(None, 30, 128)
concatenate_1‘s output shape	(None, 30, 256)
attention_weights‘s output shape	(None, 30, 1)
dot_1‘s output shape	(None, 1, 128)
dense_2‘s output shape	(None, 11)

像往常一样，在Keras中创建模型后，您需要编译它并定义您想要使用的损失函数，优化器和指标metrics。使用categorical_crossentropy loss，优化器Adam optimizer编译你的模型(learning rate = 0.005, $\beta_1 = 0.9$, $\beta_2 = 0.999$, decay = 0.01)， metrics是[‘accuracy‘]

### START CODE HERE ### (≈2 lines)
opt = Adam(lr=0.005, beta_1=0.9, beta_2=0.999, decay=0.001)
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])
### END CODE HERE ###

最后一步是定义所有输入和输出以训练模型：

你已经有了包含训练样例的X形状$(m = 10000, T_x = 30)$。
需要创建s0和c0以使用0初始化post_activation_LSTM_cell。
根据model()，"outputs" 为11个shape (m, T_y)元素的列表。这样outputs[i][0], ..., outputs[i][Ty]表示对应于 $i^{th}$ 训练样本(X[i])的真实label（字符）。更一般地，outputs[i][j]是第$i^{th}$真实label的第$j^{th}$字符。

s0 = np.zeros((m, n_s))
c0 = np.zeros((m, n_s))
outputs = list(Yoh.swapaxes(0,1))

让我们现在适合模型并运行一个 epoch.

model.fit([Xoh, s0, c0], outputs, epochs=1, batch_size=100)

在训练时，您可以看到输出的10个位置中的每个位置的损失和准确性。下表给出了一个例子，说明如果batch有两个例子，精度可能是多少：
技术分享图片

Thus, dense_2_acc_8: 0.89 means that you are predicting the 7th character of the output correctly 89% of the time in the current batch of data.

我们已经运行了这个模型更长时间，并保存了权重。运行下一个单元格以加载我们的权重。（通过训练模型几分钟，您应该能够获得类似精度的模型，但加载我们的模型将节省您的时间。）

model.load_weights('models/model.h5')

You can now see the results on new examples.

EXAMPLES = ['3 May 1979', '5 April 09', '21th of August 2016', 'Tue 10 Jul 2007', 'Saturday May 9 2018', 'March 3 2001', 'March 3rd 2001', '1 March 2001']
for example in EXAMPLES:
    
    source = string_to_int(example, Tx, human_vocab)
    source = np.array(list(map(lambda x: to_categorical(x, num_classes=len(human_vocab)), source))).swapaxes(0,1)
    prediction = model.predict([[source.T], s0, c0]) 
    #prediction = model.predict([source, s0, c0]) #原来的写法维度不对
    prediction = np.argmax(prediction, axis = -1)
    output = [inv_machine_vocab[int(i)] for i in prediction]
    
    print("source:", example)
    print("output:", ''.join(output))

source: 3 May 1979
output: 1979-05-03
source: 5 April 09
output: 2009-05-05
source: 21th of August 2016
output: 2016-08-21
source: Tue 10 Jul 2007
output: 2007-07-10
source: Saturday May 9 2018
output: 2018-05-09
source: March 3 2001
output: 2001-03-03
source: March 3rd 2001
output: 2001-03-03
source: 1 March 2001
output: 2001-03-01

您还可以更改这些示例以使用您自己的示例进行测试。下一部分将让您更好地了解注意机制正在做什么 - 在生成特定输出字符时网络注意哪些部分输入, what part of the input the network is paying attention to when generating a particular output character.

3 - 可视化 Attention (Optional / Ungraded)

由于问题具有10的固定输出长度，因此还可以使用10个不同的softmax单元来执行该任务以生成输出的10个字符。但注意模型的一个优点是输出的每个部分（比如月份）都知道它只需要依赖于输入的一小部分（输入中给出月份的字符）。我们可以可视化输出的哪个部分正在查看输入的哪个部分。

考虑将"Saturday 9 May 2018"翻译为 "2018-05-09"的任务。如果我们可视化计算出的attention 权重参数$\alpha^{\langle t, t' \rangle}$ 我们得到这个：

技术分享图片

Figure 8: Full Attention Map

注意输出如何忽略输入的“Saturday”部分。输出时间步长都没有注意到输入的“Saturday”部分。我们还看到9已被翻译为09并且May已被正确翻译为05，输出时要注意翻译所需的输入部分。年主要要求它注意输入的“18”以产生“2018”。

3.1 - 从网络获取激活

现在让我们可视化网络中的注意力值。我们将通过网络传播一个样本，然后可视化$\alpha^{\langle t, t' \rangle}$的值。

为了确定注意力值的位置（where the attention values are located），让我们首先打印模型的summary。

model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 30, 37)       0                                            
__________________________________________________________________________________________________
s0 (InputLayer)                 (None, 64)           0                                            
__________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 30, 64)       17920       input_1[0][0]                    
__________________________________________________________________________________________________
repeat_vector_1 (RepeatVector)  (None, 30, 64)       0           s0[0][0]                         
                                                                 lstm_2[0][0]                     
                                                                 lstm_2[1][0]                     
                                                                 lstm_2[2][0]                     
                                                                 lstm_2[3][0]                     
                                                                 lstm_2[4][0]                     
                                                                 lstm_2[5][0]                     
                                                                 lstm_2[6][0]                     
                                                                 lstm_2[7][0]                     
                                                                 lstm_2[8][0]                     
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 30, 128)      0           bidirectional_1[0][0]            
                                                                 repeat_vector_1[0][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[1][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[2][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[3][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[4][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[5][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[6][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[7][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[8][0]            
                                                                 bidirectional_1[0][0]            
                                                                 repeat_vector_1[9][0]            
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 30, 10)       1290        concatenate_1[0][0]              
                                                                 concatenate_1[1][0]              
                                                                 concatenate_1[2][0]              
                                                                 concatenate_1[3][0]              
                                                                 concatenate_1[4][0]              
                                                                 concatenate_1[5][0]              
                                                                 concatenate_1[6][0]              
                                                                 concatenate_1[7][0]              
                                                                 concatenate_1[8][0]              
                                                                 concatenate_1[9][0]              
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 30, 1)        11          dense_1[0][0]                    
                                                                 dense_1[1][0]                    
                                                                 dense_1[2][0]                    
                                                                 dense_1[3][0]                    
                                                                 dense_1[4][0]                    
                                                                 dense_1[5][0]                    
                                                                 dense_1[6][0]                    
                                                                 dense_1[7][0]                    
                                                                 dense_1[8][0]                    
                                                                 dense_1[9][0]                    
__________________________________________________________________________________________________
attention_weights (Activation)  (None, 30, 1)        0           dense_2[0][0]                    
                                                                 dense_2[1][0]                    
                                                                 dense_2[2][0]                    
                                                                 dense_2[3][0]                    
                                                                 dense_2[4][0]                    
                                                                 dense_2[5][0]                    
                                                                 dense_2[6][0]                    
                                                                 dense_2[7][0]                    
                                                                 dense_2[8][0]                    
                                                                 dense_2[9][0]                    
__________________________________________________________________________________________________
dot_1 (Dot)                     (None, 1, 64)        0           attention_weights[0][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[1][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[2][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[3][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[4][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[5][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[6][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[7][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[8][0]          
                                                                 bidirectional_1[0][0]            
                                                                 attention_weights[9][0]          
                                                                 bidirectional_1[0][0]            
__________________________________________________________________________________________________
c0 (InputLayer)                 (None, 64)           0                                            
__________________________________________________________________________________________________
lstm_2 (LSTM)                   [(None, 64), (None,  33024       dot_1[0][0]                      
                                                                 s0[0][0]                         
                                                                 c0[0][0]                         
                                                                 dot_1[1][0]                      
                                                                 lstm_2[0][0]                     
                                                                 lstm_2[0][2]                     
                                                                 dot_1[2][0]                      
                                                                 lstm_2[1][0]                     
                                                                 lstm_2[1][2]                     
                                                                 dot_1[3][0]                      
                                                                 lstm_2[2][0]                     
                                                                 lstm_2[2][2]                     
                                                                 dot_1[4][0]                      
                                                                 lstm_2[3][0]                     
                                                                 lstm_2[3][2]                     
                                                                 dot_1[5][0]                      
                                                                 lstm_2[4][0]                     
                                                                 lstm_2[4][2]                     
                                                                 dot_1[6][0]                      
                                                                 lstm_2[5][0]                     
                                                                 lstm_2[5][2]                     
                                                                 dot_1[7][0]                      
                                                                 lstm_2[6][0]                     
                                                                 lstm_2[6][2]                     
                                                                 dot_1[8][0]                      
                                                                 lstm_2[7][0]                     
                                                                 lstm_2[7][2]                     
                                                                 dot_1[9][0]                      
                                                                 lstm_2[8][0]                     
                                                                 lstm_2[8][2]                     
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 11)           715         lstm_2[0][0]                     
                                                                 lstm_2[1][0]                     
                                                                 lstm_2[2][0]                     
                                                                 lstm_2[3][0]                     
                                                                 lstm_2[4][0]                     
                                                                 lstm_2[5][0]                     
                                                                 lstm_2[6][0]                     
                                                                 lstm_2[7][0]                     
                                                                 lstm_2[8][0]                     
                                                                 lstm_2[9][0]                     
==================================================================================================
Total params: 52,960
Trainable params: 52,960
Non-trainable params: 0
__________________________________________________________________________________________________

浏览上面的model.summary()输出。你可以看到，在dot_2 计算每个时间步 $t = 0, \ldots, T_y-1$的上下文向量（context vector）之前， attention_weights 层输出形状（m，30,1）的alphas 。让我们从这一层获得激活。

函数attention_map() 从模型中提取attention values并绘制它们。

attention_map = plot_attention_map(model, human_vocab, inv_machine_vocab, "Tuesday 09 Oct 1993", num = 7, n_s = 64)

<Figure size 432x288 with 0 Axes>

技术分享图片

在生成的图上，您可以观察预测输出的每个字符的attention weights。检查此图并检查网络关注的哪个位置对你有意义。（ where the network is paying attention makes sense to you.）

在日期翻译应用程序中，您将观察到大多数时间注意力有助于预测年份，并且对预测日期/月份没有太大影响。

Congratulations!

你已经完成了这项任务

这是你应该记住的内容：

机器翻译模型可用于将一个序列映射到另一个序列。它们不仅可用于翻译人类语言（如法语->英语），还可用于日期格式翻译等任务。
注意机制允许网络在生成输出的特定部分时，关注输入的最相关部分。
使用注意机制的网络可以从长度为$T_x$的输入转换为长度为$T_y$的输出，其中$T_x$和$T_y$可以不同。
你可以可视化attention weights $\alpha^{\langle t,t' \rangle}$ ，看看网络在生成每个输出在关注（paying attention to）什么。

机器翻译 - 日期翻译

原文：https://www.cnblogs.com/Moonshade/p/10953450.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)

Total params:	185,484
Trainable params:	185,484
Non-trainable params:	0
bidirectional_1‘s output shape	(None, 30, 128)
repeat_vector_1‘s output shape	(None, 30, 128)
concatenate_1‘s output shape	(None, 30, 256)
attention_weights‘s output shape	(None, 30, 1)
dot_1‘s output shape	(None, 1, 128)
dense_2‘s output shape	(None, 11)