Image for post
Image for post

Early versions of TensorFlow have major identity crises. So many APIs are proposed with split personalities. Thankfully, the adoption of the Keras API in TensorFlow 2 (TF 2) has changed the picture. More significantly, the API becomes more programmer-friendly, easier to trace, and less tedious. In this article, we show how to use this API to build deep learning models.

Sequential Model with CIFAR10

Keras API provides a high-level built-in class “models.Sequential” to create common deep learning (DL) models. In this section, we will train a CIFAR10 classifier and make predictions from it. The first part of the code below prepares the CIFAR10 dataset for training. Then, we build a CNN classifier and configure a training that uses an Adam optimizer with the cross-entropy loss function (model.compile). We also configure the training to collect data on accuracy. Finally, we call “model.fit” to train the model which returns history that contains the performance metric. The coding is pretty simple so I will let it speaks for itself.

Image for post
Image for post

model.Sequential is a stacked model in which each layer takes the output of the last layer as input. And each layer has exactly one input and one output tensor — a tensor is a TF class either containing a scalar, a vector, or an n-dimension array. Even with this limitation, this model covers most of the basic Deep Learning models. To double-check the code, we can use model.summary to display a summary of the model architecture. For example, the first layer in our model is a conv2d layer which outputs a tensor with a shape of (None, 30, 30, 32). None represents any batch size and this layer has an output spatial dimension of 30×30 with 32 output channels.

Image for post
Image for post

Sometimes, we can print out the model summary without completing the coding of the whole model (say, calling summary before adding the Dense layers). This allows us to troubleshoot a model incrementally.

To verify the training progress, we can plot the training and validation accuracy using the histories returned from model.fit.

Image for post
Image for post
Image for post
Image for post

We can also evaluate the model accuracy using testing images and print out its accuracy.

Image for post
Image for post

Finally, the code below demonstrates how to make predictions. We simply call model(images) which images contain 25 testing images.

Image for post
Image for post

Sometimes, we want the outputs to be probability values instead of a logit score. Here, we create another Sequential model with the first layer to be the original model and the second layer (the output) to be a softmax layer.

Image for post
Image for post

As a final step in our demonstration code, we visualize the images and predictions with

Image for post
Image for post
Image for post
Image for post

Callable

As shown before, predictions can be made by calling the model with image(s) as input.

Image for post
Image for post

i.e. model is callable: we make a call with a model instance as in model(some_input).

Image for post
Image for post

Indeed, layers are callable also. We can fabricate testing input and call the layers to do quick tracing on the operation results.

Image for post
Image for post

MNIST classifier with Sequential model (using condensed syntax)

The code below is an MNIT classifier. It has a more condensed and simpler syntax in composing a Sequential model. But it is less flexible in tracing problems during early development. This model includes a layer to rescale the input values to the range of 0 to 1.0 and reshape the input dimension from (28, 28) to (28, 28, 1) — the format expected by the CNN layer.

Image for post
Image for post

We can test out the model quickly without loading the real data. In the example below, we generate random data in fitting a model and making predictions. In TF, we can pass ndarray or TF tensors as data.

Image for post
Image for post

Avoid overfit

To avoid overfitting, we can add regularization and dropout into the model.

Image for post
Image for post

model.compile options

In this section, we demonstrate options of configuring the training.

Optimizer learning rate

For some optimizers, we can create a custom schedule for the learning rate decay as:

Image for post
Image for post

Metrics

The metrics argument in compile holds a list of metrics that model.fit returns when the training is done.

Image for post
Image for post

Here are some available built-in options.

Image for post
Image for post

Note: Instead of using the built-in class, we can pass many parameters as strings also (like: “sparse_categorical_accuracy”).

Custom cost function

Alternatively, we can define a custom loss function, like an MSE (mean square error) below.

Image for post
Image for post

model.fit

Next, we examine other options in training a model.

validation_split

We can reserve a percentage of training samples for validation, say the last x% of the training samples. To use this option, the training data must be a ndarray or a tensor.

Image for post
Image for post

Alternatively, we can prepare and supply a separate validation dataset for model.fit.

Image for post
Image for post

To create a validation dataset from the training dataset, we can “take” part of the training data for validation and use “skip” to use the rest of the data for training.

Image for post
Image for post

class weights/sample weights (optional)

In classification, there are cases where we want to be more accurate in specific classes. The code below associates a class with a weight used for the loss function. In this example, class 5 has a larger weight so we can classify them more accurately.

Image for post
Image for post

In some situations, we want weights to be associated with samples rather than classes. Therefore, we can pay more attention to specific sets of samples.

Image for post
Image for post

Composing Models

Often, we don’t build a completely new model. Instead, we may compose a model with existing models or pre-trained models and later add new layers. For example, the code below appends an FC-layer after a predefined model.

Image for post
Image for post

In our second example, it uses a pre-trained MobileNet model from the TensorFlow Hub (TF Hub) for the feature extraction. This is a common practice to use out-of-the-box models just for feature extraction. Then we add new dense layers as the head for classification.

Image for post
Image for post

input, layer, and output

In object detection, we often take features from multiple layers at different resolutions to make predictions. So, it is common for a model to have multiple outputs, in particular, the feature extractor used in object detection. But Sequential model has only one output tensor. Fortunately, the input(s) and the layers of a model are accessible by model.inputs and model.layers. In the code below, we construct a new model, using keras.Model instead of keras.Sequential. It uses the initial_model.input as input and uses all three layers in initial_model as output. Hence, the “features” in line 32 is a list containing 3 tensors with each tensor holds the output of a convolution layer.

Image for post
Image for post

Or we can be more selective by outputting selected convolution layers only.

Image for post
Image for post

We will come back later on how to use these outputs.

Sequential Models Recap

Sequential models allow us to build common but less complex models. It is a stacked model in which one layer followed by another. We use the output of the last layer as input to the next layer. And each layer has exactly one input tensor and one output tensor.

Disclaimer

Readers may find it irritating that I use screenshots for the codes and you cannot cut and paste them. Indeed, this is my intention. One of the weaknesses of TensorFlow (TF) is the lack of good architecture oversight. Many TF libraries or APIs come and go. Fortunately, TF 2 endorses Kersa as APIs to make it much stable. Since TF API changes pretty frequently, keeping our code updated can be challenging. Indeed, since TF 0.7, I wrote many TF articles that I never published. So, for the latest coding, you should always refer to the TF guide. Most of the code in this article series is originated from there. In addition, I will not answer troubleshooting questions here. Troubleshooting is never a show-and-tell business. Indeed, you will find the answer much faster and better with a google search. Let’s get back to the fun part.

Functional API

Next, we will explore the Functional API which provides more flexibility in modeling a model comparing to a Sequential model. For example, it allows multiple input and outputs, shareable parameters, and skip connections. It also promotes the reusability of layers.

Let’s recreate the MNIST classifier using the functional API. We start with the keras.Input which defines the input shape. Then we create the necessary data flow between layers. In the last step, we create a classifier model using keras.Model with the input and the output defined.

Image for post
Image for post

Let’s have another example in extracting features.

Image for post
Image for post

The model.compile, model.fit, model.evaluate and model.predict for the functional API will work the same way as the Sequential model. For completeness, here is the code to train and evaluate the classifier model. There is the same as a Sequential model.

Image for post
Image for post

The functional API allows us to share layers easily. From line 17 to 25 below, we use the functional API to build an encoder model. Autoencoder is an encoder followed by a decoder. With the encoder flow already programmed, we can just continue building the flow from the encoder output (encoder_output). We add more layers from line 28 to 33 and build the autoencoder.

Image for post
Image for post

Sometimes, we want to compose existing models together to create a new model. In the code below, the autoencoder composed of an encoder and a decoder model that build separately.

Image for post
Image for post

Since models and layers are callable, we can define a new model by composing data flow using different models (lines 40 and 41 below). If wished, we can even add new layers, say between lines 39 and 42.

Image for post
Image for post

Just as a thought exercise, let’s see how to add a classifier head to a feature extractor that outputs multiple tensors. From line 17 to 34, we create an extractor with two outputs — feature_3 and feature_4 layer.

Image for post
Image for post

Then, we extract the features for “input” in line 37 below. Then, we use two separate branches to process these two feature tensors. Finally, we concatenate them followed by a dense layer to classify the input.

Image for post
Image for post

Model Class

We can use the functional API to create a custom model class. This put all the layer codings into an object-oriented design. In previous sections, it is not easy to trace the output of a layer. But in the custom model below, we can put line breakers in the callable to trace the outputs. (Note: @tf.function annotation will disable this capability.) This is one major benefit of wrapping layers and models in a custom model class object.

Image for post
Image for post

Complex models

Let’s build more complex models that are commonly used in deep learning with the functional API.

Multiple Input and/or Output

In this model, we take 3 input tensor — one for the title, one of the body, and one of the tags. This model predicts 2 values — the category of the department and the priority.

Image for post
Image for post

With 2 outputs, we also configure two loss functions, one for the priority and one for the department. We can also use the loss_weights parameters below to configure the corresponding weights used for the backpropagation.

Image for post
Image for post

We can also apply different metrics for different outputs. In the example below, the output named “score_output” and “class_output” uses two different loss and metrics.

Image for post
Image for post

Skip Connection

Skip connection is a must-have feature for many state-of-the-art models. The input of a layer comes from the last layer and some previous layer(s). In this example, we use layers.add to create skip connections.

Image for post
Image for post

Here is a skip connection followed by 2 CNN layers and a skip connection. For illustration, we call the summary to examine its connections. For example, the conv2d_4 layer takes inputs from the conv2d_3 and max_pooling2d layer.

Image for post
Image for post

Shared layers

A model may have layers that share the same design and weights. For example, the two feature extractors below have the same architecture and share the same weights.

Image for post
Image for post

For the code below, the embedding layer is shared with two different input tensors.

Image for post
Image for post

Features Extraction Using Pre-trained Models & Multiple Layers

As mentioned before, in object detection, we use features from multiple layers at different resolutions in predicting objects. Just for demonstration, let’s use the vgg19 model to extract features. The example below uses all the layers in vgg19 as extracted features.

Image for post
Image for post

But in practice, we use selected layers only.

Image for post
Image for post

Let’s elaborate the idea with an image segmentation network that generates a mask to segment an object.

Image for post
Image for post
Source

The diagram below is part of an FPN model (Feature Pyramid Networks). It contains a downsampling network (the yellow one) with progressively reduced spatial dimensions and an upsampling network to reconstruct the spatial information. Skip connections connect the downsampling layers to the upsampling layers at the corresponding spatial resolutions. This allows the upsampler to construct spatial information from its upsampled data and the content features extracted by the downsampler.

Image for post
Image for post

The object segmentation here uses a similar concept. It uses a pre-trained MobileNet v2 as a downsampler to extract content features. We will export 5 layers to be connected to the upsampler.

Image for post
Image for post

Then we create an unet model that contains this downsampler and 5 upsampling layers (4 are implemented as pix2pix.upsample and 1 as transpose convolution). It also has 4 skip connections which concatenating downsampler and upsampler outputs with the respective spatial resolution.

Image for post
Image for post

Here is the model design for your reference.

Image for post
Image for post
Modified from source

And pix2pix.upsample composes of Transpose convolution, batch normalization, dropout, and ReLU.

Image for post
Image for post

Callbacks

Methods model.fit, model.evaluate and model.predict generates lifecycle events that our application can act, like, at the beginning or end of a training batch or epoch.

For example, at the end of an epoch, ModelCheckPoint determines whether to save a model. The code below uses out-of-the-box callbacks to save information to TensorBoad, to save checkpoints, and to do early stopping. Here is a list of built-in callbacks. We can also develop custom callbacks.

Image for post
Image for post

TensorBoard

We can collect statistical metrics with tf.keras.metrics. “update_state” below accumulates loss metrics to be log later with tf.summary.scalar. These metrics can then be reset with reset_states for every “log_freq”.

Image for post
Image for post
Source

Then, we can invoke the tensorboard to analyze the log directory.

Image for post
Image for post

Tensor & NumPy ndarray

Eager execution works with Numpy. NumPy operations accept tf.Tensor as arguments and many TF operations accept NumPy ndarry and covert them to tf.Tensor first. We can also use tf.Tensor.numpy to convert a tensor to a NumPy ndarray. The code example demonstrates ways to create Tensors for simple testing and how it can be indexed.

Image for post
Image for post

Sparse Tensor (optional)

Image for post
Image for post
Source

TF operations

For the final section, we will quickly demonstrate some common APIs used in TensorFlow applications. The code is pretty straight forward and we will explain it through the code comments.

tf.expand_dims

Image for post
Image for post

tf.reshape

Image for post
Image for post

tf.squeeze

Image for post
Image for post

tf.cast

Image for post
Image for post

tf.stack

Image for post
Image for post

tf.concat

Image for post
Image for post

tf.split

Image for post
Image for post

tf.reduce_sum

Image for post
Image for post

tf.tile

Image for post
Image for post

tf.random.uniform

Image for post
Image for post

tf.random.normal

Image for post
Image for post

Credit & References

The code in this article series is originated from the TensorFlow guide.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store