728x90

Class-aware Sampling.

 

The Places2 challenge dataset has more than 8M training images in total. The numbers of images in different classes are imbalanced, ranging from 4,000 to 30,000 per class. The large scale data and nonuniform class distribution pose great challenges for model learning.

 

To address this issue, we apply a sampling strategy, named “class-aware sampling”, during training. We aim to fill a mini-batch as uniform as possible with respect to classes, and prevent the same example and class from always appearing in a permanent order. In practice, we use two types of lists, one is class list, and the other is per-class image list. Taking Places2 challenge dataset for example, we have one class list, and 401 per-class image lists. When getting a training mini-batch in an iteration, we first sample a class X in the class list, then sample an image in the per-class image list of class X. When reaching the end of the per-class image list of class X, a shuffle operation is performed to reorder the images of class X. When reaching the end of class list, a shuffle operation is performed to reorder the classes. We leverage such a class-aware sampling strategy to effectively tackle the non-uniform class distribution, and the gain of accuracy on the validation set is about 0.6%.

728x90

+ Recent posts