如何将图像保存为h5py文件？

Question

10 浏览2023年7月8日

匿名的 2023年7月9日

0 Comments

我有一个火车文件夹。这个文件夹中有2000张不同大小的图片。同时我还有一个labels.csv文件。在训练网络时，加载和调整这些图片需要很长时间。所以我阅读了一些关于h5py的论文，这是解决这种情况的方法。

我尝试了以下代码:

PATH = os.path.abspath(os.path.join('Data'))
SOURCE_IMAGES = os.path.join(PATH, "Train")
print "[INFO]正在读取图片路径"
images = glob(os.path.join(SOURCE_IMAGES, "*.jpg"))
images.sort()
print "[INFO]正在读取图片标签"
labels = pd.read_csv('Data/labels.csv')
train_labels=[]
for i in range(len(labels["car"])):
    if(labels["car"][i]==1.0):
        train_labels.append(1.0)
    else:
        train_labels.append(0.0)
data_order = 'tf' 
if data_order == 'th':
    train_shape = (len(images), 3, 224, 224)
else:
    train_shape = (len(images), 224, 224, 3
print "[INFO]正在创建h5py文件"
hf=h5py.File('data.hdf5', 'w')
hf.create_dataset("train_img",
                  shape=train_shape,
                  maxshape=train_shape,
                  compression="gzip",
                  compression_opts=9)
hf.create_dataset("train_labels",
            shape=(len(train_labels),),
            maxshape=(None,),
            compression="gzip",
            compression_opts=9)
hf["train_labels"][...] = train_labels
print "[INFO]读取和调整图片大小"
for i,addr in enumerate(images):
    s=dt.datetime.now()
    img = cv2.imread(images[i])
    img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    hf["train_img"][i, ...] = img[None]
    e=dt.datetime.now()
    print "[INFO]图片",str(i),"已保存，用时:", e-s, "秒"
hf.close()

但当我运行这段代码时，代码运行了几个小时。起初很快，但后来读取非常慢，尤其是在这一行hf["train_img"][i, ...] = img[None]。这是程序的输出。正如您所见，时间不断增加。我哪里做错了吗？感谢建议。

enter image description here

0

1 答案

匿名的 · Answer 1 · 2023-07-13T15:04:49+00:00

如何将图像保存为h5py文件？

在保存图像为h5py文件时，我们可以使用compression_opts参数来控制压缩级别。train_img是通过设置compression_opts=9创建的，这是最高的压缩级别，压缩和解压缩需要最多的工作。

如果压缩图像的时间成为瓶颈，并且您可以用一些空间来换取时间，那么可以使用较低的压缩级别，如默认值(=4)。甚至可以完全禁用压缩。

下面是一个示例代码，展示了如何将图像保存为h5py文件，并设置不同的压缩级别：

import h5py
import numpy as np
# 读取图像数据
image = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# 创建h5py文件
with h5py.File('image.h5', 'w') as f:
    # 创建数据集
    dset = f.create_dataset('image_data', data=image, compression='gzip', compression_opts=9)
    print('Compressed image size:', dset.shape)
    
    # 使用默认压缩级别
    dset = f.create_dataset('image_data_default', data=image, compression='gzip')
    print('Default compressed image size:', dset.shape)
    
    # 禁用压缩
    dset = f.create_dataset('image_data_no_compression', data=image)
    print('Uncompressed image size:', dset.shape)

在上面的代码中，我们使用gzip压缩算法并设置了compression_opts参数来控制压缩级别。在创建数据集时，可以根据需要选择不同的压缩选项。

通过设置不同的压缩级别，您可以在时间和空间之间进行权衡。较高的压缩级别可以减小文件大小，但可能需要更长的时间来压缩和解压缩图像。较低的压缩级别则可以加快压缩和解压缩的速度，但会占用更多的空间。

通过选择合适的压缩级别，您可以根据具体需求来优化图像保存为h5py文件的过程。