Python 当使用keras.utils.Image_dataset_from_directory仅加载测试数据集时，结果不同

发布于04月19日

我使用以下行来获取我的测试数据集:

test_ds = keras.utils.image_dataset_from_directory(img_path, image_size=image_size, batch_size = batch_size)

当我在此上运行我的模型时，我会得到以下统计数据:Accuracy = 0.5214362272240086, Precision = 0.5950113378684807, F1-score = 0.5434962717481359

然而，当我以这种方式加载我的数据集时:

_, new_images = keras.utils.image_dataset_from_directory(img_path, shuffle=True, subset="both", seed=1, validation_split=0.9999, image_size=image_size, batch_size = batch_size)

性能统计数据为:Accuracy = 0.9635388739946381, Precision = 0.9658291457286432, F1-score = 0.96875

为什么会发生这种情况？有类似的经历吗？

编辑-代码获取上述指标:

predict = model.predict(new_images)
actual = tf.concat([y for x, y in new_images], axis=0).numpy().tolist()

# Get optimal threshold
fpr, tpr, thresholds = sklearn.metrics.roc_curve(actual, predict)

# Youden's index
J = tpr - fpr

# Optimal threshold
threshold = thresholds[np.argmax(J)]

# Use threshold
predicted = [1 if res > threshold else 0 for res in predict]

# Metrics
print(sklearn.metrics.accuracy_score(actual, predicted), sklearn.metrics.f1_score(actual, predicted), sklearn.metrics.precision_score(actual, predicted)

df = tf.keras.preprocessing.image_dataset_from_directory( r'path//to//images', labels=list(range(13)), # mock labels, every image has another label shuffle=True, seed=1 ) _, df2 = tf.keras.preprocessing.image_dataset_from_directory( r'path//to//images', labels=list(range(13)), # mock labels, every image has another label shuffle=True, seed=1, subset='both', validation_split=.999 ) print([x[1] for x in df][0].numpy()) # prints: [ 7 5 8 0 11 1 4 6 3 2 10 9 12] print([x[1] for x in df][0].numpy()) # prints: [ 8 9 6 0 7 12 10 5 4 3 2 1 11] print([x[1] for x in df2][0].numpy()) # prints: [ 3 4 10 1 6 0 7 12 9 8 11 5] print([x[1] for x in df2][0].numpy()) # prints: [ 3 4 10 1 6 0 7 12 9 8 11 5]

Python 当使用keras.utils.Image_dataset_from_directory仅加载测试数据集时，结果不同

推荐答案

Python相关问答推荐

如何销毁框架并使其在tkinter中看起来像以前的样子？

使用from_pandas将GeDataFrame转换为polars失败，ArrowType错误：未传递numpy. dype对象

有条件地采样我的大型DF的最有效方法

如何根据条件在多指标框架上进行groupby

Polars LazyFrame在收集后未返回指定的模式顺序

'discord.ext. commanders.cog没有属性监听器'

如何获取numpy数组的特定索引值？

如何在Raspberry Pi上检测USB并使用Python访问它？

使用密钥字典重新配置嵌套字典密钥名

如果满足某些条件，则用另一个数据帧列中的值填充空数据帧或数组

当我try 在django中更新模型时，模型表单数据不可见

如何在Python中使用另一个数据框更改列值(列表)

基于行条件计算(pandas)

如何检测鼠标/键盘的空闲时间，而不是其他输入设备？

为什么在FastAPI中创建与数据库的连接时需要使用生成器？

如何在Gekko中使用分层条件约束

freq = inject在pandas中做了什么？''它与freq = D有什么不同？''

Python类型提示：对于一个可以迭代的变量，我应该使用什么？

如何关联来自两个Pandas DataFrame列的列表项？

为什么按下按钮后屏幕的 colored颜色保持不变？