我对tensorflow神经网络进行了以下预处理:

import csv
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
import tensorflow as tf
from tensorflow.keras.layers import Input,Dense,LSTM,Flatten,GlobalAveragePooling1D,Embedding,Dropout

!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/bbc-text.csv \
    -O /tmp/bbc-text.csv



# Stopwords list from https://github.com/Yoast/YoastSEO.js/blob/develop/src/config/stopwords.js
# Convert it to a Python list and paste it here
stopwords = ["a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "as", "at",
             "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "could", "did", "do",
             "does", "doing", "down", "during", "each", "few", "for", "from", "further", "had", "has", "have", "having",
             "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself", "his", "how",
             "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "it", "it's", "its", "itself",
             "let's", "me", "more", "most", "my", "myself", "nor", "of", "on", "once", "only", "or", "other", "ought",
             "our", "ours", "ourselves", "out", "over", "own", "same", "she", "she'd", "she'll", "she's", "should",
             "so", "some", "such", "than", "that", "that's", "the", "their", "theirs", "them", "themselves", "then",
             "there", "there's", "these", "they", "they'd", "they'll", "they're", "they've", "this", "those", "through",
             "to", "too", "under", "until", "up", "very", "was", "we", "we'd", "we'll", "we're", "we've", "were",
             "what", "what's", "when", "when's", "where", "where's", "which", "while", "who", "who's", "whom", "why",
             "why's", "with", "would", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself",
             "yourselves"]

#----------------------------------- Ream from Csv and remove the stopwords
sentences = []
labels = []
with open("/tmp/bbc-text.csv", 'r') as csvfile:
    reader = csv.reader(csvfile, delimiter=',')
    next(reader)
    for row in reader:
        labels.append(row[0])
        sentence = row[1]
        for word in stopwords:
            token = " " + word + " "
            sentence = sentence.replace(token, " ")
            sentence = sentence.replace(" ", " ")
        sentences.append(sentence)


#----------------------------------  Tokenize sentences
tokenizer = Tokenizer(oov_token="<OOV>")
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)
padded = pad_sequences(sequences, padding = 'post')

#--------------------------------- Tokenize labels
label_tokenizer = Tokenizer()
label_tokenizer.fit_on_texts(labels)
# label_word_index = label_tokenizer.word_index
label_seq = label_tokenizer.texts_to_sequences(labels)`

最后是神经网络,它通过准备好的数据工作:

train_sentence = tf.convert_to_tensor(padded,tf.int32)
train_label = tf.convert_to_tensor(label_seq,tf.int32)

input = Input(shape=(2441,))
x = Embedding(input_dim=10000,output_dim=128)(input)
x = LSTM(64,return_sequences=True)(x)
x = LSTM(64,return_sequences=True)(x)
x = LSTM(64,return_sequences=True)(x)
x = Dropout(0.2)(x)
x = LSTM(64)(x)
x = Flatten()(x)
output = Dense(5, activation='softmax')(x)
model = tf.keras.models.Model(input,output)

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(x=train_sentence,y=train_label,epochs=10)

但是,它失败了,出现以下错误:

InvalidArgumentError: Graph execution error:

推荐答案

Embedding层的input_dim必须与数据的词汇表+1的大小相对应.此外,使用sparse_categorical_crossentropy损耗函数时,标签应从零开始,而不是从一开始.下面是一个基于您的代码和数据的工作示例:

# ...
# ...
train_sentence = tf.convert_to_tensor(padded,tf.int32)
train_label = tf.convert_to_tensor(label_seq,tf.int32)
train_label = train_label - 1

input = Input(shape=(2441,))
x = Embedding(input_dim=len(tokenizer.word_index) + 1,output_dim=128)(input)
x = LSTM(64,return_sequences=True)(x)
x = LSTM(64,return_sequences=True)(x)
x = LSTM(64,return_sequences=True)(x)
x = Dropout(0.2)(x)
x = LSTM(64)(x)
x = Flatten()(x)
output = Dense(5, activation='softmax')(x)
model = tf.keras.models.Model(input,output)

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.fit(x=train_sentence,y=train_label,epochs=10)

Python相关问答推荐

什么相当于pytorch中的numpy累积ufunc

如何使用根据其他值相似的列从列表中获取的中间值填充空NaN数据

数据抓取失败:寻求帮助

把一个pandas文件夹从juyter笔记本放到堆栈溢出问题中的最快方法?

在Python 3中,如何让客户端打开一个套接字到服务器,发送一行JSON编码的数据,读回一行JSON编码的数据,然后继续?

下三角形掩码与seaborn clustermap bug

Gekko中基于时间的间隔约束

为什么我的sundaram筛这么低效

BeautifulSoup-Screper有时运行得很好,很健壮--但有时它失败了::可能这里需要一些更多的异常处理?

如何在验证文本列表时使正则表达式无序?

Polars表达式无法访问中间列创建表达式

PYTHON中的selenium不会打开 chromium URL

通过对列的其余部分进行采样,在Polars DataFrame中填充_null`?

将Pandas DataFrame中的列名的长文本打断/换行为_STRING输出?

生产者/消费者-Queue.get by list

如何在开始迭代自定义迭代器类时重置索引属性?

使用pytest测试是否缺少导入

逐个像素图像处理的性能问题(枕头)

在Matplotlib中通过特定的Y值而不是 colored颜色 来改变alpha/opacity

是否在DataFrame中将所有列设置为大写?