Home News 1000x Faster Data Augmentation

1000x Faster Data Augmentation

138
0

Effect of Population Based Augmentation employed to images, that differs at various percentages into training.

In this blog post we introduce Population Based Augmentation (PBA), an
Algorithm that quickly and economically learns a state-of-the-art approach to
augmenting data for neural network training. PBA matches the previous best
effect on CIFAR and SVHN but uses one thousand times less
compute, enabling researchers and professionals to effectively learn
new augmentation policies using one workstation GPU. You can utilize PBA
broadly to enhance deep learning performance on image recognition activities.

We discuss the PBA results from our current paper and then show how
To readily operate PBA for yourself to a new data collection from the Tune frame.

Why do you take care of data augmentation?

Recent improvements in learning models have been mostly attributed to the
Diversity and quantity of data gathered in the last several decades. Data enhancement is a
strategy that permits professionals to significantly boost the diversity of
data available for training versions, without really collecting new data. Data
augmentation techniques such as cropping, padding, and horizontal turning are
commonly used to train large neural networks. But most approaches utilized in
training neural networks just utilize basic types of enhancement. While neural
network architectures are researched in depth, less attention has been put
into discovering powerful types of data enhancement and data enhancement policies which catch data invariances.

An image of the amount “& 3 rdquo; in original shape and with fundamental augmentations
implemented.

Recently, Google has been able to push the state-of-the-art precision on
Datasets such as CIFAR-10 using AutoAugment, a new automated data
augmentation technique. AutoAugment has indicated that prior work using just
applying a predetermined pair of transformations like horizontal flipping or padding and
cropping leaves potential performance on the desk. AutoAugment introduces 16
geometric and color-based transformations, and formulates an augmentation
policy that selects up to two transformations at certain magnitude levels to
apply to each batch of data. These high performing augmentation policies are
learned by training versions directly on the data using reinforcement learning.

What’s the catch?

AutoAugment is a Really expensive algorithm that requires training 15,000 models
To convergence to generate enough samples to get a reinforcement learning based
coverage. No computation is shared between samples, and it prices 15,000 NVIDIA
Tesla P100 GPU hours to learn an ImageNet augmentation policy and 5,000 GPU
hours to learn an CIFAR-10 one. By way of instance, if using Google Cloud on-demand
P100 GPUs, it might cost approximately \$7,500 to find a CIFAR coverage, and \$37,500
to find an ImageNet one! Therefore, a more common usage case when training on
a new dataset is to transfer a pre-existing published coverage, which the
authors show works fairly well.

Population Based Augmentation

Our formulation of data augmentation policy search, Population Based
Augmentation (PBA), reaches comparable levels of evaluation precision on many different neural network models while utilizing 3 orders of magnitude compute.
We know an augmentation policy by training a few copies of a small model on
CIFAR-10 data, which takes five hours using a NVIDIA Titan XP GPU. This policy
exhibits powerful performance when used for training from scratch larger version architectures and with CIFAR-100 data.

Relative to the many days it takes to train big CIFAR-10 networks to
Convergence, the cost of conducting PBA ahead is marginal and considerably enhances results. For instance, training a PyramidNet model on CIFAR-10 requires more than 7 days to get a NVIDIA V100 GPU, therefore learning that a PBA policy provides just 2%
precompute training time overhead. This overhead would be even lower, under 1 percent,
for SVHN.

CIFAR-10 evaluation set error between PBA, AutoAugment, along with the baseline that merely uses horizontal turning, padding, and cropping, on WideResNet, Shake-Shake, and PyramidNet+ShakeDrop versions. PBA is
considerably better compared to baseline and on-par using AutoAugment.

PBA leverages the Population
According Training algorithm
to generate an augmentation policy schedule that may accommodate depending on the current epoch of training. This is in contrast to a
fixed augmentation policy that uses the exact transformations independent of
the current epoch number.

This Permits a typical workstation user to experiment with the internet search
Algorithm and augmentation operations. One interesting use case is to
introduce new augmentation operations, maybe targeted towards a specific dataset or image modality, and also be in a position to quickly produce a customized, high
performing augmentation schedule. During ablation studies, we’ve found that
the learned hyperparameters and schedule order are all important for good results.

How can the enhancement schedule learned?

We utilize Population Based Training using a population of 16 small WideResNet
Models. Each worker in the population will learn a different candidate
hyperparameter schedule. We transfer the very best performing schedule to train
bigger versions from scratch, from which we derive our evaluation error metrics.

Overview of Population Based Training, that discovers hyperparameter programs by training that a population of neural networks. It unites random search
(explore) with the copying of model weights from top performing employees (exploit). Source

The population models are trained on the target dataset of attention starting
Together with improvement hyperparameters set to 0 (no augmentations applied). At
frequent intervals, an “exploit-and-explore” process “exploits” high-performance employees by copying their model weights to low performing employees, then
“explores” by perturbing the hyperparameters of the worker. By means of this
process, we are ready to share compute heavily between the employees and target
different augmentation hyperparameters at various areas of training. Thus,
PBA is able to prevent the cost of training tens of thousands of versions to convergence so as to achieve high performance.

Example and Code

We leverage Tune’s built-in implementation of PBT to ensure it is straightforward to
Use PBA.

Import beam def explore(config):”””Custom PBA function to perturb augmentation hyperparameters.”””

ray.init()
pbt = ray.tune.schedulers.PopulationBasedTraining(
time_attr=”training_iteration”,
reward_attr=”val_acc”,
perturbation_interval=3,
custom_explore_fn=explore)
train_spec = … # Things like file paths, model func, compute.
ray.tune.run_experiments(“PBA”: train_spec, scheduler=pbt)

We predict Tune’so implementation of PBT using our customized exploration purpose. This
Will make 16 copies of the WideResNet model and train them time-multiplexed.
The coverage schedule utilized by each backup is stored to disk and can be retrieved
after termination to utilize for training new versions.

You can run PBA by following the README at: https://github.com/arcelien/pba. On
A Titan XP, it just takes one hour to learn a top performing augmentation
policy schedule about the SVHN dataset. It is also simple to use PBA onto a custom
dataset too: just define a new dataloader and everything else falls into
place.

Big thanks to Daniel Rothchild, Ashwinee Panda, Aniruddha Nrusimha, Daniel
Seita, Joseph Gonzalez, and Ion Stoica for helpful feedback while writing this
post. Don’t hesitate to get in contact with us around Github!

This post is based on the next paper to appear in ICML 2019 as an oral
Presentation:

Population Based Augmentation: Efficient Learning of Augmentation Policy Schedules
Daniel Ho, Eric Liang, Ion Stoica, Pieter Abbeel, Xi Chen
Paper Code

Buy Tickets for every event – Sports, Concerts, Festivals and more buy tickets