我正在try 在TensorFlow中创建一个自定义的LSTMCell.我的CPU有24GB的RAM(没有GPU).首先,我创建了一个LSTMCell作为默认LSTMCell.代码如下:
class LSTMCell(tf.keras.layers.AbstractRNNCell):
def __init__(self, units, **kwargs):
self.units = units
super(LSTMCell, self).__init__(**kwargs)
def build(self, input_shape):
input_dim = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim, self.units * 4),name='kernel',initializer='uniform')
self.recurrent_kernel = self.add_weight(shape=(self.units, self.units * 4),name='recurrent_kernel',initializer='uniform')
self.bias = self.add_weight(shape=(self.units * 4,),name='bias',initializer='uniform')
def _compute_carry_and_output_fused(self, z, c_tm1):
z0, z1, z2, z3 = z
i = K.sigmoid(z0)
f = K.sigmoid(z1)
c = f * c_tm1 + i * K.tanh(z2)
o = K.sigmoid(z3)
return c, o
def call(self, inputs, states, training=None):
h_tm1 = states[0]
c_tm1 = states[1]
z = K.dot(inputs, self.kernel)
z += K.dot(h_tm1, self.recurrent_kernel)
z = K.bias_add(z, self.bias)
z = tf.split(z, num_or_size_splits=4, axis=1)
c, o = self._compute_carry_and_output_fused(z, c_tm1)
h = o * K.sigmoid(c)
self.h = h
self.c = c
return h, [h,c]
这间牢房工作正常.它只消耗8GB的RAM.然后,我根据自己的需要修改了单元格,将参数增加了一倍.代码如下:
class LSTMCell(tf.keras.layers.AbstractRNNCell):
def __init__(self, units, **kwargs):
self.units = units
super(LSTMCell, self).__init__(**kwargs)
def build(self, input_shape):
input_dim = input_shape[-1]
self.kernel = self.add_weight(shape=(input_dim, self.units * 4),name='kernel',initializer='uniform')
self.recurrent_kernel = self.add_weight(shape=(self.units, self.units * 4),name='recurrent_kernel',initializer='uniform')
self.bias = self.add_weight(shape=(self.units * 4,),name='bias',initializer='uniform')
self.kernel_bits = self.add_weight(shape=(input_dim, self.units * 4),name='_diffq_k',initializer='uniform',trainable=True)
self.recurrent_kernel_bits = self.add_weight(shape=(self.units, self.units * 4),name='_diffq_rk',initializer='uniform',trainable=True)
def _compute_carry_and_output_fused(self, z, c_tm1):
z0, z1, z2, z3 = z
i = K.sigmoid(z0)
f = K.sigmoid(z1)
c = f * c_tm1 + i * K.tanh(z2)
o = K.sigmoid(z3)
return c, o
def call(self, inputs, states, training=None):
h_tm1 = states[0]
c_tm1 = states[1]
z = K.dot(inputs, self.kernel + self.kernel_bits)
z += K.dot(h_tm1, self.recurrent_kernel + self.recurrent_kernel_bits)
z = K.bias_add(z, self.bias)
z = tf.split(z, num_or_size_splits=4, axis=1)
c, o = self._compute_carry_and_output_fused(z, c_tm1)
h = o * K.sigmoid(c)
self.h = h
self.c = c
return h, [h,c]
现在,当我try 使用这个单元进行训练时,它会在几秒钟内消耗掉我所有的RAM并导致死亡.我使用的模型如下所示:
input_shape = (1874, 1024)
input = tf.keras.layers.Input(shape=input_shape, name = "input_layer")
x = input
lstm = tf.keras.layers.RNN(LSTMCell(units=input_shape[1]), return_sequences = True)
x = lstm(x)
model = tf.keras.models.Model(input, x, name='my_model')
对于同一个数据集,两个单元的RAM消耗量有很大不同.我已经try 减少输入维度,在我的能力范围内,我只能训练128个单元的lstm.如果我超过这一点,公羊就会吃饱,训练也会失败.我在PyTorch也做了同样的事情,没有任何问题.有人能指出我遇到的问题的原因吗?