GAN — Some cool applications of GAN

Jonathan Hui
8 min readJun 22, 2018

--

Photo by SwapnIl Dwivedi

We make impressive progress in the first few years of GAN developments. No more stamp-size facial pictures like those in horror movies. In 2017, GAN produced 1024 × 1024 images that can fool a talent scout.

Larger resolution with better quality in the first 2 years.

In the coming years, we will probably see high-quality videos generated from GANs. The commercial applications will come! As part of the GAN series, we look into some cool applications and hope that they become the inspiration for your GAN application.

Create Anime characters

Game development and animation production are expensive and hire many production artists for relatively routine tasks. GAN can auto-generate and colorize Anime characters.

Towards the automatic Anime characters creation with Generative Adversarial Networks

The generator and the discriminator composes of many layers of convolutional layers, batch normalization and ReLU with skip connections.

Source

Pose Guided Person Image Generation

With an additional input of the pose, we can transform an image into different poses. For example, the top right image is the ground truth while the bottom right is the generated image.

The refined result column below is the generated images.

Pose Guided Person Image Generation

The design composes of a 2-stage image generator and a discriminator. The generator reconstruct an image using the meta-data (pose) and the original image. The discriminator uses the original image as part of the label input to a CGAN design.

Pose Guided Person Image Generation

CycleGAN

Cross-domain transfer GANs will be likely the first batch of commercial applications. These GANs transform images from one domain (say real scenery) to another domain (Monet paintings or Van Gogh).

CycleGAN

For example, it can transform pictures between zebras and horses.

CycleGAN

CycleGAN builds 2 networks G and F to construct images from one domain to another and in the reverse direction. It uses discriminators D to critic how well the generated images are. For example, G converts real images to Van Gogh style painting and Dy is used to distinguish whether the image is real or generated.

Domain A ➝ Domain B:

We repeat the process in the reverse direction Domain B➝ Domain A:

StarGAN

StarGAN is an image-to-image translation for one domain to another. For example, given a happy face, we want to transform it into a fearful face.

Source

In (b), the generator generates a fake image based on an input image and a target domain label (say angry). In (c), given this fake image and the original domain of the image (say happy), it reconstructs the image using the generator. In (d), we feed real and fake images to the discriminator to label it as real or not as well as its domain classification. The cost function will involve reconstruction errors as well as the discriminator cost in identifying the images and their labels.

Source

PixelDTGAN

Suggesting merchandise based on celebrity pictures has been popular for fashion blogger and e-commerce. PixelDTGAN creates clothing images and styles from an image.

PixelDTGAN
PixelDTGAN

Super resolution

Create super-resolution images from the lower resolution. This is one area where GAN shows very impressive result with immediate commercial possibility.

SRGAN

Similar to many GAN designs, it composes of many layers of convolutional layer, batch normalization, advanced ReLU and skip connections.

SRGAN

Progressive growing of GANs

Progressive GAN is probably one of the first GAN showing commercial-like image quality. Below is 1024 × 1024 celebrity look images created by GAN.

Progressive growing of GANs

It applies the strategy of divide-and-conquer to make training much feasible. Layers of convolution layers are trained once at a time to build images of 2× resolution.

In 9 phases, a 1024 × 1024 image is generated.

Progressive growing of GANs

StyleGAN2

StyleGAN2 generates high-resolution images.

Source

High-resolution image synthesis

This is not image segmentation! It is the reverse, generating images from a semantic map. Collecting samples are very expensive. We have trying to supplement training dataset with generated data to lower development cost. It will be handy to generate videos in training autonomous cars rather than see them cruising in your neighborhood.

pix2pixHD

Network design:

pix2pixHD
pix2pixHD

GauGAN

GauGAN synthesizes photorealistic images given an input semantic layout.

Modified from source

Text to image (StackGAN)

Text to image is one of the earlier application of domain-transfer GAN. We input a sentence and generate multiple images fitting the description.

StackGAN
Source

Text to Image Synthesis

Another popular implementation:

Generative Adversarial Text to Image Synthesis

Face synthesis

Synthesis faces in different poses: With a single input image, we create faces in different viewing angles. For example, we can use this to transform images that will be easier for face recognition.

TP-GAN
TP-GAN

Image inpainting

Repair images have been an important subject decades ago. GAN is used to repair images and fill the missing part with created “content”.

Context encoder

Learn Joint Distribution

It is expensive to create GANs with different combinations of facial characters P(blond, female, smiling, with glasses), P(brown, male, smiling, no glasses) etc…The curse of dimensionality makes the number of GANs to grow exponentially. Instead, we can learn individual data distribution and combine them to form different distributions. i.e. different attribute combinations.

CoGAN
CoGAN

DiscoGAN

DiscoGAN provides matching style: many potential applications. DiscoGAN learns cross domain relationship without labels or pairing. For example, it successfully transfers style (or patterns) from one domain (handbag) to another (shoe).

DiscoGAN

DiscoGAN and CycleGAN are very similar in the network design.

DiscoGAN

Pix2Pix

Pix2Pix is an image-to-image translation that get quoted in cross-domain GAN’s paper frequently. For example, it converts a satellite image into a map (the bottom left).

pix2pix

DTN

Creating Emoji from pictures.

DTN
DTN

Texture synthesis

MGAN

Image editing (IcGAN)

Reconstruct or edit images with specific attributes.

IcGAN
IcGAN

Face aging (Age-cGAN)

Age-cGAN
Age-cGAN

DeblurGAN

DeblurGAN performs motion deblurring.

Source

Neural Photo Editor

Content based image editing: for example, extend the hairband.

Neural Photo Editor

Refine image

Object detection

This is one application in enhancing an existing solution with GAN.

Perceptual GAN

Image blending

Blending images together.

GP-GAN

Video generation

Create new video sequence. It recognizes what is background and create new time sequence for the foreground action.

Generate 3D objects

This is one often quoted paper in creating 3D objects with GAN.

3DGAN
3DGAN

Music generation

GAN can be applied to non-image domain, like composing music.

MidiNet
MidiNet

Medical (Anomaly Detection)

GAN can also extend to other industry, for example medical in tumor detection.

AnoGAN
AnoGAN

Further Readings

This article shows some of the GAN application. For those interested in further study of GAN:

Part 1: Focus on how GANs are applied to solve deep learning problems, and an overview of why it is so hard to train GANs.

Part 2: An overview of solving the training problems in GAN.

All the articles in this series:

--

--