我对神经网络和机器学习仍然是新手,我在理解我在PyTorch中遇到的问题以及如何解决它方面存在困难.

我的数据集,一旦存储在inputsoutputs中,就是一个inputs00x125x6数组,表示6个时间依赖变量和125个时间步长,有inputs00个独立集合.我试图用下面的代码来模拟这一点,但在向后计算渐变时出现错误.我看到了涉及detach()‘ing或在model_opt.step()之后插入model_opt.zero_grad()的答案;然而,我并不是真的理解发生了什么,才知道这些是不是正确的解决方案(或者如何让它们发挥作用),并正在寻求更多的澄清和帮助.

只是为了澄清我的代码打算做什么:在train()00内,我手动地将train()个独立集合中的train()个批次分组.从每一批中,我得到损失,将其与该时期以来所有批次的损失相加,然后计算该时期到目前为止的平均损失.然后,我使用这个平均损失来更新优化器.

下面是一个最小的可重现的例子:

from pathlib import Path
import numpy as np
import h5py
import torch
from torch import nn
from torch import optim


class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes, num_layers, batch_size=1):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_classes = num_classes
        self.batch_size = batch_size
        self.num_layers = num_layers
        
        self.rnn = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x, device):
        hidden = torch.zeros(self.num_layers, self.batch_size, self.hidden_size, dtype=torch.float64).to(device)
        cell_state = torch.zeros(self.num_layers, self.batch_size, self.hidden_size, dtype=torch.float64).to(device)

        output, (hidden, cell_state) = self.rnn(x, (hidden, cell_state))

        output = self.fc(output)
        return output, hidden, cell_state


def train(epochs, rnn_model, model_loss, model_opt, inputs, outputs, batch_size, device):
    for epoch in range(epochs):
        rnn_model.train()
        model_opt.zero_grad()
        total_loss = 0.0
        num_batches = np.ceil(inputs.shape[0]/batch_size)
        for batch_i in range(int(num_batches)):
            start = batch_i*batch_size
            if batch_i == num_batches - 1:
                end = inputs.shape[0]
            else:
                end = (batch_i+1)*batch_size
            inp = inputs[start:end, :, :]
            target = outputs[start:end, :, :]

            out, hidden, cell_state = rnn_model(inp, device)
            total_loss += model_loss(out, target)

            loss = total_loss/end
            loss.backward()
            model_opt.step()
    return


def main(fname, input_size, hidden_size, output_size, num_layers, batch_size, num_epochs, learning_rate):
    data_dir = Path(r'path\to\my\data')

    # load data
    train_file = data_dir / f'NN_{fname}.h5'
    f = h5py.File(train_file, 'r')
    inputs = np.swapaxes(np.array(f['series']['input']), 0, 2)
    outputs = np.swapaxes(np.array(f['series']['output']), 0, 2)

    # Define model, optimizer, and loss
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = RNN(input_size, hidden_size, output_size, num_layers, batch_size=batch_size).to(device)
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    loss_func = nn.MSELoss()

    # send data to computation device
    inputs = torch.from_numpy(inputs).to(device)
    outputs = torch.from_numpy(outputs).to(device)
    
    # pre-training
    i_temp = inputs[:, range(25), :]
    o_temp = outputs[:, range(25), :]
    train(int(num_epochs*0.01), model, loss_func, optimizer, i_temp, o_temp, batch_size, device)
    return


if __name__ == '__main__':
    torch.set_default_dtype(torch.float64)
    input_size = 6
    hidden_size = 7
    output_size = 6
    num_epochs = 2500
    batch_size = 100
    learning_rate = 0.0001
    num_layers = 3
    f_name = 'data'
    main(f_name, input_size, hidden_size, output_size, num_layers, batch_size, num_epochs, learning_rate)

下面的错误是从train()内的loss.backward()产生的

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

编辑:我已经在model_opt.step()之后立即添加了total_loss = total_loss.detach(),现在它运行起来没有错误.然而,我仍然想知道,根据我上面所述的意图,这是否正确.

推荐答案

正如@c p所阐明的那样,在model_opt.step()之后加上total_loss = total_loss.detach()确实是解决方案,以便根据每批处理后的平均损失适当地更新优化器.

Python相关问答推荐

如何在python polars中停止otherate(),当使用when()表达式时?

Python中绕y轴曲线的旋转

基于索引值的Pandas DataFrame条件填充

我对我应该做什么以及我如何做感到困惑'

Python导入某些库时非法指令(核心转储)(beautifulsoup4."" yfinance)

需要帮助重新调整python fill_between与数据点

Tkinter菜单自发添加额外项目

如何检测鼠标/键盘的空闲时间,而不是其他输入设备?

干燥化与列姆化的比较

PYTHON、VLC、RTSP.屏幕截图不起作用

提高算法效率的策略?

为什么Python内存中的列表大小与文档不匹配?

计算机找不到已安装的库'

解决Geopandas和Altair中的正图和投影问题

每次查询的流通股数量

如何在Quarto中的标题页之前创建序言页

try 在单个WITH_COLUMNS_SEQ操作中链接表达式时,使用Polars数据帧时出现ComputeError

GEKKO中若干参数的线性插值动态优化

IpyWidget Select 框未打开

Matplotlib中破碎Barh图的渐变 colored颜色