Visualize Deep Network models and metrics (Part 4)

In troubleshooting a deep network, people jump to conclusions too early too fast. Before learning how to troubleshoot it, we will spend sometimes on what to look for before spending hours in tracing dead end leads. In part 4 of the “Start a Deep Learning project”, we discuss how to visualize your Deep Learning models and performance metrics.

The 6-part series for “How to start a Deep Learning project?” consists of:

· Part 1: Start a Deep Learning project.
· Part 2: Build a Deep Learning dataset.
· Part 3: Deep Learning designs.
· Part 4: Visualize Deep Network models and metrics.
· Part 5: Debug a Deep Learning Network.
· Part 6: Improve Deep Learning Models performance & network tuning.

Never shot in the dark. Make an educated guess.

TensorBoard

It is important to track every move and to examine results at each step. With the help of pre-built package like TensorBoard, visualize the model and metrics is easy and the rewards are almost instantaneously.

Data visualization (input, output)

Verifying the input and the output of the model. Before feeding data into a model, save some training and validation samples for visual verification. Apply steps to undo the data pre-processing. Rescale the pixel value back to [0, 255]. Check a few batches to verify we are not repeating the same batch of data. The left side images below are some training samples and the right is a validation sample.

Image for post
Image for post

Sometimes, it is nice to verify the input data’s histogram. Ideally, it should be zero-centered ranging from -1 to 1. If features are in different scales, the gradients will either diminish or explode (subject to the learning rate).

Image for post
Image for post
Source Sung Kim

Save the corresponding model’s outputs regularly for verification and error analysis. For example, the color in the validation output is washing out.

Metrics (Loss & accuracy)

Besides logging the loss and the accuracy to the stdout regularly, we record and plot them to analyze its long-term trend. The diagram below is the accuracy and the cross entropy loss displayed by the TensorBoard.

Image for post
Image for post

Plotting the cost helps us to tune the learning rate. Any prolonged jump in cost indicates the learning rate is too high. If it is too low, we learn slowly.

Image for post
Image for post

Here is another real example when the learning rate is too high. We see a sudden surge in loss (likely caused by a sudden jump in the gradient).

Image for post
Image for post
From Emil Wallner

We use the plot on accuracy to tune regularization factors. If there is a major gap between the validation and the training accuracy, the model is overfitted. To reduce overfitting, we increase regularizations.

Image for post
Image for post

History summary

Weight & bias: We monitor the weights and the biases closely. Here are the Layer 1’s weights and biases distributions at different training iterations. Finding large (positive or negative) weights or bias is abnormal. A Normal distributed weight is a good sign that the training is going well (but not absolutely necessary).

Image for post
Image for post

Activation: For gradient descent to perform the best, the nodes’ outputs before the activation functions should be Normal distributed. If not, we may apply a batch normalization to convolution layers or a layer normalization to RNN layers. We also monitor the number of dead nodes (zero activations) after the activation functions.

Image for post
Image for post

Gradients: For each layer, we monitor the gradients to identify one of the most serious DL problems: gradient diminishing or exploding problems. If gradients diminish quickly from the rightmost layers to the leftmost layers, we have a gradient diminishing problem.

Image for post
Image for post

Not very common, we visualize the CNN filters. It identifies the type of features that the model is extracting. As shown below, the first couple convolution layers are detecting edges and colors.

Image for post
Image for post

For CNN, we can visualize what a feature map is learning. In the following picture, it captures the top 9 pictures (on the right side) having the highest activation in a particular map. It also applies a deconvolution network to reconstruct the spatial image (left picture) from the feature map.

Image for post
Image for post
Visualizing and Understanding Convolutional Networks, Matthew D Zeiler et al.

This image reconstruction is rarely done. But in a generative model, we often vary just one latent factor while holding others constant, It verifies whether the model is learning anything smart.

Image for post
Image for post
Dynamic Routing Between Capsules, Sara Sabour, Nicholas Frosst, Geoffrey E Hinton

Part 5

Visualize the models can be done easily with TensorBoard. TensorBoard is available to TensorFlow and other applications like PyTorch through a 3rd party extension. Spend sometime to visualize your model and you will save far more time in troubleshooting. Equipped with the runtime information of the model, we can start talking troubleshooting in Part 5: Debug a Deep Learning Network.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store