今天早些时候,这对我来说效果很好,但当我重新启动笔记本时,它突然表现得非常奇怪.

ds = ds.map(
lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
num_parallel_calls=tf.data.AUTOTUNE)

,其中read\u npy\u文件如下所示:

def read_npy_file(data):
    # 'data' stores the file name of the numpy binary file storing the features of a particular sound file
    # as a bytes string.
    # decode() is  called on the bytes string to decode it from a bytes string to a regular string
    # so that it can passed as a parameter into np.load()
    data = np.load(data.decode())
    return data.astype(np.float32) 

如您所见,映射应该创建另一个张量元组,其中第一个张量是numpy数组,第二个张量是标签,未触及.这在早些时候起到了很好的作用,但现在却出现了最奇怪的行为.我在read_npy_file()函数中放置了print语句,以查看传入的数据是否正确.我希望它传递一个字节字符串,但当我在read_npy_file()函数中调用print(data)并从数据集中提取1项以使用ds.take(1)触发一个映射时,它会生成此输出:

b'./challengeA_data/log_spectrogram/2603ebb3-3cd3-43cc-98ef-0c128c515863.npy'b'./challengeA_data/log_spectrogram/fab6a266-e97a-4935-a0c3-444fc4426fc5.npy'b'./challengeA_data/log_spectrogram/93014682-60a2-45bd-9c9e-7f3c97b83be9.npy'b'./challengeA_data/log_spectrogram/710f2430-5da3-4822-a252-6ad3601b92d9.npy'b'./challengeA_data/log_spectrogram/e757058c-91de-4381-8184-65f001c95647.npy'


b'./challengeA_data/log_spectrogram/38b12689-04ba-422b-a972-5856b05ca868.npy'
b'./challengeA_data/log_spectrogram/7c9ccc04-a2d2-4eec-bafd-0c97b3658c26.npy'b'./challengeA_data/log_spectrogram/c7cc3520-7218-4d07-9f0a-6bd7bb90a551.npy'



b'./challengeA_data/log_spectrogram/21f6060a-9766-4810-bd7c-0437f47ccb98.npy'

我没有修改输出的任何格式.

如果有任何帮助,我将不胜感激.与TFDS合作绝对是一场噩梦,哈哈.

这是完整的代码

def read_npy_file(data):
    # 'data' stores the file name of the numpy binary file storing the features of a particular sound file
    # as a bytes string.
    # decode() is  called on the bytes string to decode it from a bytes string to a regular string
    # so that it can passed as a parameter into np.load()
    print(data)
    data = np.load(data.decode())
    return data.astype(np.float32)

specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels))

specgram_ds = specgram_ds.map(
                    lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
                    num_parallel_calls=tf.data.AUTOTUNE)

num_files = len(train_df)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)

specgram_ds = specgram_ds.shuffle(buffer_size=1000)
specgram_train_ds = specgram_ds.take(num_train)
specgram_test_ds = specgram_ds.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)

# iterating over one item to trigger the mapping function
for item in specgram_ds.take(1):
    pass

谢谢

推荐答案

你的逻辑似乎很好.你实际上只是在观察tf.data.AUTOTUNEprint(*)的行为.根据docs:

如果值tf.数据使用自动调谐,然后根据可用CPU动态设置并行调用数.

您可以运行以下代码几次以观察更改:

import tensorflow as tf
import numpy as np

def read_npy_file(data):
    # 'data' stores the file name of the numpy binary file storing the features of a particular sound file
    # as a bytes string.
    # decode() is  called on the bytes string to decode it from a bytes string to a regular string
    # so that it can passed as a parameter into np.load()
    print(data)
    data = np.load(data.decode())
    return data.astype(np.float32)

# Create dummy data
for i in range(4):
  np.save('{}-array'.format(i), np.random.random((5,5)))


specgram_files = ['/content/0-array.npy', '/content/1-array.npy', '/content/2-array.npy', '/content/3-array.npy']
labels = [1, 0, 0, 1]
specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels))

specgram_ds = specgram_ds.map(
                    lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
                    num_parallel_calls=tf.data.AUTOTUNE)


num_files = len(specgram_files)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)

specgram_ds = specgram_ds.shuffle(buffer_size=1000)
specgram_train_ds = specgram_ds.take(num_train)
specgram_test_ds = specgram_ds.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)

for item in specgram_ds.take(1):
    pass

另请参见this.最后,请注意,使用tf.print而不是print应该比使用任何一侧effects都要好.

Python相关问答推荐

使用多个性能指标执行循环特征消除

DuckDB将蜂巢分区插入拼花文件

使用from_pandas将GeDataFrame转换为polars失败,ArrowType错误:未传递numpy. dype对象

将轨迹优化问题描述为NLP.如何用Gekko解决这个问题?当前面临异常:@错误:最大方程长度错误

Python:在类对象内的字典中更改所有键的索引,而不是仅更改一个键

Python会扔掉未使用的表情吗?

使用新的类型语法正确注释ParamSecdecorator (3.12)

将两只Pandas rame乘以指数

处理带有间隙(空)的duckDB上的重复副本并有效填充它们

无法定位元素错误404

如何在Python数据框架中加速序列的符号化

numpy卷积与有效

基于索引值的Pandas DataFrame条件填充

部分视图的DataFrame

Polars asof在下一个可用日期加入

matplotlib + python foor loop

PYTHON、VLC、RTSP.屏幕截图不起作用

ModuleNotFoundError:没有模块名为x时try 运行我的代码''

如何根据rame中的列值分别分组值

如何获取给定列中包含特定值的行号?