Python3.x 使用 Tensorflow 2.0 在 MNIST 上实现自定义神经网络

发布于07月22日

我试图用*TensorFlow 2.0 beta*在MNIST数据集上编写一个带有两个隐藏层的基本神经网络的自定义实现，但我不确定这里出了什么问题，但我的training loss和accuracy似乎分别停留在1.5和85左右.但是，如果我用Keras来构建这个模型，我的训练损失非常低，精度在95%以上，只有8-10个历元.

我相信也许我没有更新我的体重什么的？那么，我需要将我在backprop函数中计算的新权重分配给它们各自的权重/偏差变量吗？

如果有人能帮我解决这个问题，以及我在下面提到的其他几个问题，我真的很感激.

Few more Questions:

1) 如何在这个自定义实现中添加Dropout和Batch Normalization层？(i.e使其在列车和测试时间都能工作)

2) 在这个代码中如何使用callbacks？i、 e(利用EarlyStoping和ModelCheckpoint回调)

3) 下面我的代码中还有什么可以进一步优化的地方吗，比如使用tensorflow 2.x@tf.功能decorator 等)

4) 我还需要提取绘制和判断其分布所获得的最终权重.调查梯度消失或爆炸等问题.(例如:可能是张力板)

5) 我还需要帮助，以一种更通用的方式编写这段代码，这样我就可以轻松地基于这段代码实现其他网络，如ConvNets(即Conv、MaxPool等).

这是我的完整代码，便于复制:

注:I know I can use high-level API like Keras to build the model much easier but that is not my goal here. Please understand.

import numpy as np
import os
import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)
import tensorflow as tf
import tensorflow_datasets as tfds

(x_train, y_train), (x_test, y_test) = tfds.load('mnist', split=['train', 'test'], 
                                                  batch_size=-1, as_supervised=True)

# reshaping
x_train = tf.reshape(x_train, shape=(x_train.shape[0], 784))
x_test  = tf.reshape(x_test, shape=(x_test.shape[0], 784))

ds_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
# rescaling
ds_train = ds_train.map(lambda x, y: (tf.cast(x, tf.float32)/255.0, y))

class Model(object):
    def __init__(self, hidden1_size, hidden2_size, device=None):
        # layer sizes along with input and output
        self.input_size, self.output_size, self.device = 784, 10, device
        self.hidden1_size, self.hidden2_size = hidden1_size, hidden2_size
        self.lr_rate = 1e-03

        # weights initializationg
        self.glorot_init = tf.initializers.glorot_uniform(seed=42)
        # weights b/w input to hidden1 --> 1
        self.w_h1 = tf.Variable(self.glorot_init((self.input_size, self.hidden1_size)))
        # weights b/w hidden1 to hidden2 ---> 2
        self.w_h2 = tf.Variable(self.glorot_init((self.hidden1_size, self.hidden2_size)))
        # weights b/w hidden2 to output ---> 3
        self.w_out = tf.Variable(self.glorot_init((self.hidden2_size, self.output_size)))

        # bias initialization
        self.b1 = tf.Variable(self.glorot_init((self.hidden1_size,)))
        self.b2 = tf.Variable(self.glorot_init((self.hidden2_size,)))
        self.b_out = tf.Variable(self.glorot_init((self.output_size,)))

        self.variables = [self.w_h1, self.b1, self.w_h2, self.b2, self.w_out, self.b_out]


    def feed_forward(self, x):
        if self.device is not None:
            with tf.device('gpu:0' if self.device=='gpu' else 'cpu'):
                # layer1
                self.layer1 = tf.nn.sigmoid(tf.add(tf.matmul(x, self.w_h1), self.b1))
                # layer2
                self.layer2 = tf.nn.sigmoid(tf.add(tf.matmul(self.layer1,
                                                             self.w_h2), self.b2))
                # output layer
                self.output = tf.nn.softmax(tf.add(tf.matmul(self.layer2,
                                                             self.w_out), self.b_out))
        return self.output

    def loss_fn(self, y_pred, y_true):
        self.loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y_true, 
                                                                  logits=y_pred)
        return tf.reduce_mean(self.loss)

    def acc_fn(self, y_pred, y_true):
        y_pred = tf.cast(tf.argmax(y_pred, axis=1), tf.int32)
        y_true = tf.cast(y_true, tf.int32)
        predictions = tf.cast(tf.equal(y_true, y_pred), tf.float32)
        return tf.reduce_mean(predictions)

    def backward_prop(self, batch_xs, batch_ys):
        optimizer = tf.keras.optimizers.Adam(learning_rate=self.lr_rate)
        with tf.GradientTape() as tape:
            predicted = self.feed_forward(batch_xs)
            step_loss = self.loss_fn(predicted, batch_ys)
        grads = tape.gradient(step_loss, self.variables)
        optimizer.apply_gradients(zip(grads, self.variables))

n_shape = x_train.shape[0]
epochs = 20
batch_size = 128

ds_train = ds_train.repeat().shuffle(n_shape).batch(batch_size).prefetch(batch_size)

neural_net = Model(512, 256, 'gpu')

for epoch in range(epochs):
    no_steps = n_shape//batch_size
    avg_loss = 0.
    avg_acc = 0.
    for (batch_xs, batch_ys) in ds_train.take(no_steps):
        preds = neural_net.feed_forward(batch_xs)
        avg_loss += float(neural_net.loss_fn(preds, batch_ys)/no_steps) 
        avg_acc += float(neural_net.acc_fn(preds, batch_ys) /no_steps)
        neural_net.backward_prop(batch_xs, batch_ys)
    print(f'Epoch: {epoch}, Training Loss: {avg_loss}, Training ACC: {avg_acc}')

# output for 10 epochs:
Epoch: 0, Training Loss: 1.7005115111824125, Training ACC: 0.7603832868262543
Epoch: 1, Training Loss: 1.6052448933478445, Training ACC: 0.8524806404020637
Epoch: 2, Training Loss: 1.5905528008006513, Training ACC: 0.8664196092868224
Epoch: 3, Training Loss: 1.584107405738905, Training ACC: 0.8727630912326276
Epoch: 4, Training Loss: 1.5792385798413306, Training ACC: 0.8773203844903037
Epoch: 5, Training Loss: 1.5759121985174716, Training ACC: 0.8804754322627559
Epoch: 6, Training Loss: 1.5739163148682564, Training ACC: 0.8826455712551251
Epoch: 7, Training Loss: 1.5722616605926305, Training ACC: 0.8840812018606812
Epoch: 8, Training Loss: 1.569699136307463, Training ACC: 0.8867688354803249
Epoch: 9, Training Loss: 1.5679460542742163, Training ACC: 0.8885049475356936

Python3.x 使用 Tensorflow 2.0 在 MNIST 上实现自定义神经网络

推荐答案

1. Divide your program into logical parts

1.1数据加载

1.2模型创建

1.3培训

2. Other things

2.1未回答的问题

2.2总的来说

3. Questions from comments

3.1如何初始化自定义层和内置层

3.1.1 TLDR您将要阅读的内容

3.1.2从TLDR到实施

3.2 Automatic Differentiation using `tf.GradientTape`

3.2.1简介

3.2.2与深度学习的联系

Python-3.x相关问答推荐

类型注释：pathlib. Path vs importlib. resources. abc. Traversable

Pandas 数据帧断言等同于NaN

如何从拼图分区数据集中读取数据到Polar

Python避免捕获特定异常

如何使用regex将电话号码和姓名从文本字符串中分离出来

如何将参数/值从测试方法传递给pytest的fixture函数？

为什么我无法在django中按月筛选事件？

估计列表中连续对的数量

为什么不能用格式字符串 '-' 绘制点？

如何在两个矩阵的比较中允许任何列的符号差异，Python3？

为什么 return node.next 会返回整个链表？

通过附加/包含多个列表来创建 nDimensional 列表

ImportError：没有名为资源的模块

具有 2 个输入的 python 3 map/lambda 方法

为什么Pandas会在 NaN 上合并？

在 sklearn.decomposition.PCA 中，为什么 components_ 是负数？

Python：遍历子列表

如何判断一个字符串是否包含有效的 Python 代码

aiohttp+sqlalchemy：在回滚无效事务之前无法重新连接

在 PyCharm 中配置解释器：请使用不同的 SDK 名称