Welcome to my articles on Deep Learning, Reinforcement Learning, Meta-Learning, and Machine Learning. The purpose of this article is to index the series and articles I wrote so far. If you want to reach me or have non-technical comments regarding the articles, please read the F.A.Q. first for the most common questions and contact methods.

**NLP**

We can learn the value function and the *Q*-value function iteratively. However, it cannot scale well to large state space. In practice, we don’t have enough memory for all the states. But we can approximate it through the model fitting. The most common method is to use a deep network as a function approximator.

If the state space is continuous or large, it is not possible to use a large memory table to record *V(S)* for every state. However, like other deep learning methods, we can create a function estimator to approximate it.

Exploration is an art. In the high-tech world, we value bold ideas but yet we are risk-averse in reality. Releases are filled with low-hanging fruit. We keep repeating our past successes without exploring what the next success should be.

We focus on judging the ideas rather than developing them. We demand answers to all unknowns. But without exploration, those questions will never answer. Our experience is often our worst enemy. New ideas will have flaws. People didn’t want to wait for the DVD movie to come. Yet the Netflix model killed Blockbuster in 2010. …

Value-learning, including *Q*-learning, is believed to be unstable when a large and non-linear function approximator, like the deep network, is used. This is a curse for many reinforcement learning training (RL). To address these problems, DQN applies experience replay and the target network. However, DQN handles discrete and low-dimensional action spaces only. For high-dimensional or continuous action spaces, finding the action with maximum *Q*-value requires complex optimization that seems impractical.

In the two previous articles on GCN and GNN networks, we present a design overview for Graph Neural Networks. In our final article, we will focus on GNN applications. Since the basic GNN theory is already covered by the mentioned articles, we will not repeat it here. And for the design details for each application, please also refer to the original research papers.

**Medical Diagnosis & Electronic Health Records Modeling**

Medical ontology can be described by a graph, for example, the following diagram represents the ontology using a DAG (directed acyclic graph).

In general, Graph Neural Networks (GNN) refer to the general concept of applying neural networks (NNs) on graphs. In a previous article, we cover GCN which is one of the popular approaches in GNN. But in some literature, GNN may refer to a more specific approach that the hidden state of a node depends on its last states and its neighbors. We can view this as message passings with its neighboring nodes. Mathematically, it has the form of:

You know, who you choose to be around you, let’s you know who you are. — The Fast and the Furious: Tokyo Drift.

In social networks, friend connections can be realized by a social graph.

In this article, we cover the TensorFlow generative models includes:

- DCGAN
- CycleGAN
- Pix2Pix
- Neural style transfer
- Adversarial FGSM
- Autoencoder
- Autoencoder Denoising
- Autoencoder Anomaly detection
- Convolutional Variational Autoencoder (VAE)
- DeepDream

DCGAN is one popular design for GAN. It composes of convolution and transposed convolutional layers without max pooling or fully connected layers. The figure below is the network design for the generator. This example will be trained to generate MNIST digits. We will start the article with this most basic GAN model.

As part of the TensorFlow series, this article focuses on coding examples on BERT and Transformer. These examples are:

- IMDB files: Sentimental analysis with pre-trained TF Hub BERT model and AdamW,
- GLUE/MRPC BERT finetuning,
- Transformer for language translation.

In this example, we use a pre-trained TensorFlow Hub model for BERT and an AdamW optimizer. Because most of the heavy work is done by the TF Hub model, we will keep the explanation simple for this example.

First, we download and prepare IMDB files.

Sequence-to-sequence models are particularly popular in NLP. This article, as part of the TensorFlow series, will cover examples for the sequence to sequence model.