PyTorch通过TorchVision工具包提供统一的数据加载、数据处理的接口，允许自定义类的方式加载数据集，通过DataLoader接口来批量处理

A lot of effort in solving any machine learning problem goes in to preparing the data. PyTorch provides many tools to make data loading easy and hopefully, to make your code more readable. In this tutorial, we will see how to load and preprocess/augment data from a non trivial dataset.

To run this tutorial, please make sure the following packages are installed:

• scikit-image: For image io and transforms
• pandas: For easier csv parsing

• scikit-image: 用于图像输出和转换
• pandas: 用于CSV文件解析

The dataset we are going to deal with is that of facial pose. This means that a face is annotated like this:

Over all, 68 different landmark points are annotated for each face.

Download the dataset from here so that the images are in a directory named ‘data/faces/‘. This dataset was actually generated by applying excellent dlib’s pose estimation on a few images from imagenet tagged as ‘face’

Dataset comes with a csv file with annotations which looks like this:

Let’s quickly read the CSV and get the annotations in an (N, 2) array where N is the number of landmarks.

Let’s write a simple helper function to show an image and its landmarks and use it to show a sample.

## Dataset Class

torch.utils.data.Dataset is an abstract class representing a dataset. Your custom dataset should inherit Dataset and override the following methods:

• __len__ so that len(dataset) returns the size of the dataset.
• __getitem__ to support the indexing such that dataset[i] can be used to get ith sample

torch.util.data.Dataset是表示数据集的抽象类。自定义数据集类必须继承该类并重写以下方法：

• __len__：返回数据集个数
• __getitem__：支持数据集检索，返回指定的图像

Let’s create a dataset class for our face landmarks dataset. We will read the csv in __init__ but leave the reading of images to __getitem__. This is memory efficient because all the images are not stored in the memory at once but read as required.

Sample of our dataset will be a dict {‘image’: image, ‘landmarks’: landmarks}. Our dataset will take an optional argument transform so that any required processing can be applied on the sample. We will see the usefulness of transform in the next section.

Let’s instantiate this class and iterate through the data samples. We will print the sizes of first 4 samples and show their landmarks.

## Transforms

One issue we can see from the above is that the samples are not of the same size. Most neural networks expect the images of a fixed size. Therefore, we will need to write some prepocessing code. Let’s create three transforms:

• Rescale: to scale the image
• RandomCrop: to crop from image randomly. This is data augmentation.
• ToTensor: to convert the numpy images to torch images (we need to swap axes).

• Rescale：缩放图像
• RandomCrop：随机裁剪。作用于数据扩充
• ToTensor：转换numpy格式图像到torch格式

We will write them as callable classes instead of simple functions so that parameters of the transform need not be passed everytime it’s called. For this, we just need to implement __call__ method and if required,__init__ method. We can then use a transform like this:

Observe below how these transforms had to be applied both on the image and landmarks.

## Compose transforms

Now, we apply the transforms on a sample.

Let’s say we want to rescale the shorter side of the image to 256 and then randomly crop a square of size 224 from it. i.e, we want to compose Rescale and RandomCrop transforms. torchvision.transforms.Compose is a simple callable class which allows us to do this.

## Iterating through the dataset

Let’s put this all together to create a dataset with composed transforms. To summarize, every time this dataset is sampled:

• An image is read from the file on the fly
• Transforms are applied on the read image
• Since one of the transforms is random, data is augmentated on sampling

• 从文件中动态读取图像
• 对读取的图像应用变换
• 因为其中一个变换是随机的，所以数据在采样时被扩充了

We can iterate over the created dataset with a for i in range loop as before.

However, we are losing a lot of features by using a simple for loop to iterate over the data. In particular, we are missing out on:

• Batching the data
• Shuffling the data
• Load the data in parallel using multiprocessing workers.

• 批量处理数据
• 打乱数据
• 使用多处理器并行加载数据

torch.utils.data.DataLoader is an iterator which provides all these features. Parameters used below should be clear. One parameter of interest is collate_fn. You can specify how exactly the samples need to be batched using collate_fn. However, default collate should work fine for most use cases.

torch.utils.data.DataLoader是一个迭代器，能够提供上述所有的特性。感兴趣的一个参数是collate_fn。您可以使用collate_fn指定如何批量化样本集，默认设置已经能够很好的作用于大多数情况了

## Afterword: torchvision

In this tutorial, we have seen how to write and use datasets, transforms and dataloader. torchvision package provides some common datasets and transforms. You might not even have to write custom classes. One of the more generic datasets available in torchvision is ImageFolder. It assumes that images are organized in the following way:

where ‘ants’, ‘bees’ etc. are class labels. Similarly generic transforms which operate on PIL.Image like RandomHorizontalFlip, Scale, are also available. You can use these to write a dataloader like this: