Python 将函数传递给 Tensorflow 数据集映射方法时的奇怪行为

发布于06月09日

今天早些时候，这对我来说效果很好，但当我重新启动笔记本时，它突然表现得非常奇怪.

ds = ds.map(
lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
num_parallel_calls=tf.data.AUTOTUNE)

，其中read\u npy\u文件如下所示:

def read_npy_file(data):
    # 'data' stores the file name of the numpy binary file storing the features of a particular sound file
    # as a bytes string.
    # decode() is  called on the bytes string to decode it from a bytes string to a regular string
    # so that it can passed as a parameter into np.load()
    data = np.load(data.decode())
    return data.astype(np.float32)

如您所见，映射应该创建另一个张量元组，其中第一个张量是numpy数组，第二个张量是标签，未触及.这在早些时候起到了很好的作用，但现在却出现了最奇怪的行为.我在read_npy_file()函数中放置了print语句，以查看传入的数据是否正确.我希望它传递一个字节字符串，但当我在read_npy_file()函数中调用print(data)并从数据集中提取1项以使用ds.take(1)触发一个映射时，它会生成此输出:

b'./challengeA_data/log_spectrogram/2603ebb3-3cd3-43cc-98ef-0c128c515863.npy'b'./challengeA_data/log_spectrogram/fab6a266-e97a-4935-a0c3-444fc4426fc5.npy'b'./challengeA_data/log_spectrogram/93014682-60a2-45bd-9c9e-7f3c97b83be9.npy'b'./challengeA_data/log_spectrogram/710f2430-5da3-4822-a252-6ad3601b92d9.npy'b'./challengeA_data/log_spectrogram/e757058c-91de-4381-8184-65f001c95647.npy'


b'./challengeA_data/log_spectrogram/38b12689-04ba-422b-a972-5856b05ca868.npy'
b'./challengeA_data/log_spectrogram/7c9ccc04-a2d2-4eec-bafd-0c97b3658c26.npy'b'./challengeA_data/log_spectrogram/c7cc3520-7218-4d07-9f0a-6bd7bb90a551.npy'



b'./challengeA_data/log_spectrogram/21f6060a-9766-4810-bd7c-0437f47ccb98.npy'

我没有修改输出的任何格式.

如果有任何帮助，我将不胜感激.与TFDS合作绝对是一场噩梦，哈哈.

这是完整的代码

def read_npy_file(data):
    # 'data' stores the file name of the numpy binary file storing the features of a particular sound file
    # as a bytes string.
    # decode() is  called on the bytes string to decode it from a bytes string to a regular string
    # so that it can passed as a parameter into np.load()
    print(data)
    data = np.load(data.decode())
    return data.astype(np.float32)

specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels))

specgram_ds = specgram_ds.map(
                    lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
                    num_parallel_calls=tf.data.AUTOTUNE)

num_files = len(train_df)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)

specgram_ds = specgram_ds.shuffle(buffer_size=1000)
specgram_train_ds = specgram_ds.take(num_train)
specgram_test_ds = specgram_ds.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)

# iterating over one item to trigger the mapping function
for item in specgram_ds.take(1):
    pass

谢谢

import tensorflow as tf import numpy as np def read_npy_file(data): # 'data' stores the file name of the numpy binary file storing the features of a particular sound file # as a bytes string. # decode() is called on the bytes string to decode it from a bytes string to a regular string # so that it can passed as a parameter into np.load() print(data) data = np.load(data.decode()) return data.astype(np.float32) # Create dummy data for i in range(4): np.save('{}-array'.format(i), np.random.random((5,5))) specgram_files = ['/content/0-array.npy', '/content/1-array.npy', '/content/2-array.npy', '/content/3-array.npy'] labels = [1, 0, 0, 1] specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels)) specgram_ds = specgram_ds.map( lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]), num_parallel_calls=tf.data.AUTOTUNE) num_files = len(specgram_files) num_train = int(0.8 * num_files) num_val = int(0.1 * num_files) num_test = int(0.1 * num_files) specgram_ds = specgram_ds.shuffle(buffer_size=1000) specgram_train_ds = specgram_ds.take(num_train) specgram_test_ds = specgram_ds.skip(num_train) specgram_val_ds = specgram_test_ds.take(num_val) specgram_test_ds = specgram_test_ds.skip(num_val) for item in specgram_ds.take(1): pass

Python 将函数传递给 Tensorflow 数据集映射方法时的奇怪行为

推荐答案

Python相关问答推荐

使用多个性能指标执行循环特征消除

DuckDB将蜂巢分区插入拼花文件

使用from_pandas将GeDataFrame转换为polars失败，ArrowType错误：未传递numpy. dype对象

将轨迹优化问题描述为NLP.如何用Gekko解决这个问题？当前面临异常：@错误：最大方程长度错误

Python：在类对象内的字典中更改所有键的索引，而不是仅更改一个键

Python会扔掉未使用的表情吗？

使用新的类型语法正确注释ParamSecdecorator (3.12)

将两只Pandas rame乘以指数

处理带有间隙(空)的duckDB上的重复副本并有效填充它们

无法定位元素错误404

如何在Python数据框架中加速序列的符号化

numpy卷积与有效

基于索引值的Pandas DataFrame条件填充

部分视图的DataFrame

Polars asof在下一个可用日期加入

matplotlib + python foor loop

PYTHON、VLC、RTSP.屏幕截图不起作用

ModuleNotFoundError：没有模块名为x时try 运行我的代码''

如何根据rame中的列值分别分组值

如何获取给定列中包含特定值的行号？