从图像本地目录创建 tensorflow 数据集

Create tensorflow dataset from image local directory(从图像本地目录创建 tensorflow 数据集)
本文介绍了从图像本地目录创建 tensorflow 数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在本地有一个非常庞大的图像数据库,数据分布就像每个文件夹都包含一个类的图像.

I have a very huge database of images locally, with the data distribution like each folder cointains the images of one class.

我想使用 tensorflow 数据集 API 来批量获取数据,而无需将所有图像都加载到内存中.

I would like to use the tensorflow dataset API to obtain batches de data without having all the images loaded in memory.

我尝试过这样的事情:

def _parse_function(filename, label):
    image_string = tf.read_file(filename, "file_reader")
    image_decoded = tf.image.decode_jpeg(image_string, channels=3)
    image = tf.cast(image_decoded, tf.float32)
    return image, label

image_list, label_list, label_map_dict = read_data()

dataset = tf.data.Dataset.from_tensor_slices((tf.constant(image_list), tf.constant(label_list)))
dataset = dataset.shuffle(len(image_list))
dataset = dataset.repeat(epochs).batch(batch_size)

dataset = dataset.map(_parse_function)

iterator = dataset.make_one_shot_iterator()

image_list 是一个列表,其中添加了图像的路径(和名称),而 label_list 是一个列表,其中每个图像的类都以相同的顺序添加了.

image_list is a list where the path (and name) of the images have been appended and label_list is a list where the class of each image has been appended in the same order.

但是 _parse_function 不起作用,我记录的错误是:

But the _parse_function does not work, the error that I recibe is:

ValueError:形状必须为 0 级,但对于输入形状为 [?] 的file_reader"(操作:ReadFile")为 1 级.

ValueError: Shape must be rank 0 but is rank 1 for 'file_reader' (op: 'ReadFile') with input shapes: [?].

我已经用谷歌搜索了错误,但对我没有任何作用.

I have googled the error, but nothing works for me.

如果我不使用地图功能,我只是记录图像的路径(存储在 image_list 中),所以我认为我需要地图功能来读取图像,但我无法做到有效.

If I do not use the map function, I just recibe the path of the images (which are store in image_list), so I think that I need the map function to read the images, but I am not able to make it works.

提前谢谢你.

    def read_data():
        image_list = []
        label_list = []
        label_map_dict = {}
        count_label = 0

        for class_name in os.listdir(base_path):
            class_path = os.path.join(base_path, class_name)
            label_map_dict[class_name]=count_label

            for image_name in os.listdir(class_path):
                image_path = os.path.join(class_path, image_name)

                label_list.append(count_label)
                image_list.append(image_path)

            count_label += 1

推荐答案

错误在这一行 dataset = dataset.repeat(epochs).batch(batch_size) 你的管道添加了batchsize作为维度输入.

The error is in this line dataset = dataset.repeat(epochs).batch(batch_size) Your pipeline adds batchsize as a dimension to input.

您需要在这样的地图功能之后批处理您的数据集

You need to batch your dataset after map function like this

    dataset = tf.data.Dataset.from_tensor_slices((tf.constant(image_list), tf.constant(label_list)))
    dataset = dataset.shuffle(len(image_list))
    dataset = dataset.repeat(epochs)
    dataset = dataset.map(_parse_function).batch(batch_size)

这篇关于从图像本地目录创建 tensorflow 数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Leetcode 234: Palindrome LinkedList(Leetcode 234:回文链接列表)
How do I read an Excel file directly from Dropbox#39;s API using pandas.read_excel()?(如何使用PANDAS.READ_EXCEL()直接从Dropbox的API读取Excel文件?)
subprocess.Popen tries to write to nonexistent pipe(子进程。打开尝试写入不存在的管道)
I want to realize Popen-code from Windows to Linux:(我想实现从Windows到Linux的POpen-code:)
Reading stdout from a subprocess in real time(实时读取子进程中的标准输出)
How to call type safely on a random file in Python?(如何在Python中安全地调用随机文件上的类型?)