Early versions of TensorFlow have major identity crises. So many APIs are proposed with split personalities. Thankfully, the adoption of the Keras API in TensorFlow 2 (TF 2) has changed the picture. More significantly, the API becomes more programmer-friendly, easier to trace, and less tedious. In this article, we show how to use this API to build deep learning models.

Sequential Model with CIFAR10

Keras API provides a high-level built-in class “models.Sequential” to create common deep learning (DL) models. In this section, we will train a CIFAR10 classifier and make predictions from it. The first part of the code below prepares the CIFAR10 dataset for training. Then, we build a CNN classifier and configure a training that uses an Adam optimizer with the cross-entropy loss function (model.compile). We also configure the training to collect data on accuracy. Finally, we call “model.fit” to train the model which returns history that contains the performance metric. The coding is pretty simple so I will let it speaks for itself.

model.Sequential is a stacked model in which each layer takes the output of the last layer as input. And each layer has exactly one input and one output tensor — a tensor is a TF class either containing a scalar, a vector, or an n-dimension array. Even with this limitation, this model covers most of the basic Deep Learning models. To double-check the code, we can use model.summary to display a summary of the model architecture. For example, the first layer in our model is a conv2d layer which outputs a tensor with a shape of (None, 30, 30, 32). None represents any batch size and this layer has an output spatial dimension of 30×30 with 32 output channels.

Sometimes, we can print out the model summary without completing the coding of the whole model (say, calling summary before adding the Dense layers). This allows us to troubleshoot a model incrementally.

To verify the training progress, we can plot the training and validation accuracy using the histories returned from model.fit.

We can also evaluate the model accuracy using testing images and print out its accuracy.

Finally, the code below demonstrates how to make predictions. We simply call model(images) which images contain 25 testing images.

Sometimes, we want the outputs to be probability values instead of a logit score. Here, we create another Sequential model with the first layer to be the original model and the second layer (the output) to be a softmax layer.

As a final step in our demonstration code, we visualize the images and predictions with


As shown before, predictions can be made by calling the model with image(s) as input.

i.e. model is callable: we make a call with a model instance as in model(some_input).

Indeed, layers are callable also. We can fabricate testing input and call the layers to do quick tracing on the operation results.

MNIST classifier with Sequential model (using condensed syntax)

The code below is an MNIT classifier. It has a more condensed and simpler syntax in composing a Sequential model. But it is less flexible in tracing problems during early development. This model includes a layer to rescale the input values to the range of 0 to 1.0 and reshape the input dimension from (28, 28) to (28, 28, 1) — the format expected by the CNN layer.

We can test out the model quickly without loading the real data. In the example below, we generate random data in fitting a model and making predictions. In TF, we can pass ndarray or TF tensors as data.

Avoid overfit

To avoid overfitting, we can add regularization and dropout into the model.

model.compile options

In this section, we demonstrate options of configuring the training.

Optimizer learning rate

For some optimizers, we can create a custom schedule for the learning rate decay as:


The metrics argument in compile holds a list of metrics that model.fit returns when the training is done.

Here are some available built-in options.

Note: Instead of using the built-in class, we can pass many parameters as strings also (like: “sparse_categorical_accuracy”).

Custom cost function

Alternatively, we can define a custom loss function, like an MSE (mean square error) below.


Next, we examine other options in training a model.


We can reserve a percentage of training samples for validation, say the last x% of the training samples. To use this option, the training data must be a ndarray or a tensor.

Alternatively, we can prepare and supply a separate validation dataset for model.fit.

To create a validation dataset from the training dataset, we can “take” part of the training data for validation and use “skip” to use the rest of the data for training.

class weights/sample weights (optional)

In classification, there are cases where we want to be more accurate in specific classes. The code below associates a class with a weight used for the loss function. In this example, class 5 has a larger weight so we can classify them more accurately.

In some situations, we want weights to be associated with samples rather than classes. Therefore, we can pay more attention to specific sets of samples.

Composing Models

Often, we don’t build a completely new model. Instead, we may compose a model with existing models or pre-trained models and later add new layers. For example, the code below appends an FC-layer after a predefined model.

In our second example, it uses a pre-trained MobileNet model from the TensorFlow Hub (TF Hub) for the feature extraction. This is a common practice to use out-of-the-box models just for feature extraction. Then we add new dense layers as the head for classification.

input, layer, and output

In object detection, we often take features from multiple layers at different resolutions to make predictions. So, it is common for a model to have multiple outputs, in particular, the feature extractor used in object detection. But Sequential model has only one output tensor. Fortunately, the input(s) and the layers of a model are accessible by model.inputs and model.layers. In the code below, we construct a new model, using keras.Model instead of keras.Sequential. It uses the initial_model.input as input and uses all three layers in initial_model as output. Hence, the “features” in line 32 is a list containing 3 tensors with each tensor holds the output of a convolution layer.

Or we can be more selective by outputting selected convolution layers only.

We will come back later on how to use these outputs.

Sequential Models Recap

Sequential models allow us to build common but less complex models. It is a stacked model in which one layer followed by another. We use the output of the last layer as input to the next layer. And each layer has exactly one input tensor and one output tensor.


Readers may find it irritating that I use screenshots for the codes and you cannot cut and paste them. Indeed, this is my intention. One of the weaknesses of TensorFlow (TF) is the lack of good architecture oversight. Many TF libraries or APIs come and go. Fortunately, TF 2 endorses Kersa as APIs to make it much stable. Since TF API changes pretty frequently, keeping our code updated can be challenging. Indeed, since TF 0.7, I wrote many TF articles that I never published. So, for the latest coding, you should always refer to the TF guide. Most of the code in this article series is originated from there. In addition, I will not answer troubleshooting questions here. Troubleshooting is never a show-and-tell business. Indeed, you will find the answer much faster and better with a google search. Let’s get back to the fun part.

Functional API

Next, we will explore the Functional API which provides more flexibility in modeling a model comparing to a Sequential model. For example, it allows multiple input and outputs, shareable parameters, and skip connections. It also promotes the reusability of layers.

Let’s recreate the MNIST classifier using the functional API. We start with the keras.Input which defines the input shape. Then we create the necessary data flow between layers. In the last step, we create a classifier model using keras.Model with the input and the output defined.

Let’s have another example in extracting features.

The model.compile, model.fit, model.evaluate and model.predict for the functional API will work the same way as the Sequential model. For completeness, here is the code to train and evaluate the classifier model. There is the same as a Sequential model.

The functional API allows us to share layers easily. From line 17 to 25 below, we use the functional API to build an encoder model. Autoencoder is an encoder followed by a decoder. With the encoder flow already programmed, we can just continue building the flow from the encoder output (encoder_output). We add more layers from line 28 to 33 and build the autoencoder.

Sometimes, we want to compose existing models together to create a new model. In the code below, the autoencoder composed of an encoder and a decoder model that build separately.

Since models and layers are callable, we can define a new model by composing data flow using different models (lines 40 and 41 below). If wished, we can even add new layers, say between lines 39 and 42.

Just as a thought exercise, let’s see how to add a classifier head to a feature extractor that outputs multiple tensors. From line 17 to 34, we create an extractor with two outputs — feature_3 and feature_4 layer.

Then, we extract the features for “input” in line 37 below. Then, we use two separate branches to process these two feature tensors. Finally, we concatenate them followed by a dense layer to classify the input.

Model Class

We can use the functional API to create a custom model class. This put all the layer codings into an object-oriented design. In previous sections, it is not easy to trace the output of a layer. But in the custom model below, we can put line breakers in the callable to trace the outputs. (Note: @tf.function annotation will disable this capability.) This is one major benefit of wrapping layers and models in a custom model class object.

Complex models

Let’s build more complex models that are commonly used in deep learning with the functional API.

Multiple Input and/or Output

In this model, we take 3 input tensor — one for the title, one of the body, and one of the tags. This model predicts 2 values — the category of the department and the priority.

With 2 outputs, we also configure two loss functions, one for the priority and one for the department. We can also use the loss_weights parameters below to configure the corresponding weights used for the backpropagation.

We can also apply different metrics for different outputs. In the example below, the output named “score_output” and “class_output” uses two different loss and metrics.

Skip Connection

Skip connection is a must-have feature for many state-of-the-art models. The input of a layer comes from the last layer and some previous layer(s). In this example, we use layers.add to create skip connections.

Here is a skip connection followed by 2 CNN layers and a skip connection. For illustration, we call the summary to examine its connections. For example, the conv2d_4 layer takes inputs from the conv2d_3 and max_pooling2d layer.

Shared layers

A model may have layers that share the same design and weights. For example, the two feature extractors below have the same architecture and share the same weights.

For the code below, the embedding layer is shared with two different input tensors.

Features Extraction Using Pre-trained Models & Multiple Layers

As mentioned before, in object detection, we use features from multiple layers at different resolutions in predicting objects. Just for demonstration, let’s use the vgg19 model to extract features. The example below uses all the layers in vgg19 as extracted features.

But in practice, we use selected layers only.

Let’s elaborate the idea with an image segmentation network that generates a mask to segment an object.


The diagram below is part of an FPN model (Feature Pyramid Networks). It contains a downsampling network (the yellow one) with progressively reduced spatial dimensions and an upsampling network to reconstruct the spatial information. Skip connections connect the downsampling layers to the upsampling layers at the corresponding spatial resolutions. This allows the upsampler to construct spatial information from its upsampled data and the content features extracted by the downsampler.

The object segmentation here uses a similar concept. It uses a pre-trained MobileNet v2 as a downsampler to extract content features. We will export 5 layers to be connected to the upsampler.

Then we create an unet model that contains this downsampler and 5 upsampling layers (4 are implemented as pix2pix.upsample and 1 as transpose convolution). It also has 4 skip connections which concatenating downsampler and upsampler outputs with the respective spatial resolution.

Here is the model design for your reference.

Modified from source

And pix2pix.upsample composes of Transpose convolution, batch normalization, dropout, and ReLU.


Methods model.fit, model.evaluate and model.predict generates lifecycle events that our application can act, like, at the beginning or end of a training batch or epoch.

For example, at the end of an epoch, ModelCheckPoint determines whether to save a model. The code below uses out-of-the-box callbacks to save information to TensorBoad, to save checkpoints, and to do early stopping. Here is a list of built-in callbacks. We can also develop custom callbacks.


We can collect statistical metrics with tf.keras.metrics. “update_state” below accumulates loss metrics to be log later with tf.summary.scalar. These metrics can then be reset with reset_states for every “log_freq”.


Then, we can invoke the tensorboard to analyze the log directory.

Tensor & NumPy ndarray

Eager execution works with Numpy. NumPy operations accept tf.Tensor as arguments and many TF operations accept NumPy ndarry and covert them to tf.Tensor first. We can also use tf.Tensor.numpy to convert a tensor to a NumPy ndarray. The code example demonstrates ways to create Tensors for simple testing and how it can be indexed.

Sparse Tensor (optional)


TF operations

For the final section, we will quickly demonstrate some common APIs used in TensorFlow applications. The code is pretty straight forward and we will explain it through the code comments.












Credit & References

The code in this article series is originated from the TensorFlow guide.

Deep Learning