Deep Reinforcement Learning is about making the best decisions for what we see and what we hear. It sounds simple but making a decision is never easy. This subject is one of the hardest and one most rewarding. I try to explain things with an easy to understand angle. I don’t want to fill my readers with fancy talks that feel good but learn nothing. In reality, simplicity makes me see through the subject in better clarity. But I don’t want to skip the equations either. It just needs to be introduced in the proper manner. Understand them helps us to go deeper.
While there are still many articles need to be reviewed before publishing, the published one should give you enough details to start your journey. For the remaining articles, I will try to release them in 2019 Spring. So stay tuned.
RL— Introduction to Deep Reinforcement Learning
Deep reinforcement learning is about taking the best actions from what we see and hear. Unfortunately, reinforcement…
RL — Deep Reinforcement Learning (Learn effectively like a human)
A human learns much efficient than RL. In this article, we will study other methods that may narrow this gap.
RL — Transfer Learning (Learn from the Past)
Humans are explorers and we do it smartly. In reinforcement learning RL, model-free methods search the solution space…
RL — Value Learning
Value learning is a fundamental concept in reinforcement learning. It is as basic as the fully connected network in…
RL — Value Fitting & Q-Learning
We can learn the value function and the Q-value function iteratively. However, it cannot scale well to large state…
Monte Carlo Tree Search (MCTS) in AlphaGo Zero
AlphaGo Zero uses MCTS to select the next move in a Go game.
RL — DQN Deep Q-network
Can computers play video games like a human? In 2017, a professional team beats a DeepMind AI program in Starcraft 2…
RL — Policy Gradients Explained
Policy Gradient Methods (PG) are frequently used algorithms in reinforcement learning (RL). The principle is very…
RL — Policy Gradients Explained (Part 2)
In the first part of the Policy Gradients article, we cover the basic. In the second part, we continue on the Temporal…
RL — Natural Policy Gradient Explained
Policy Gradient methods PG are popular in reinforcement learning RL. PG increases the chance of taking actions that…
RL — Trust Region Policy Optimization (TRPO) Explained
TRPO, one of the most popular Policy Gradient methods (PG), addresses the convergence problem by introducing the…
RL — Trust Region Policy Optimization (TRPO) Part 2
After discussing the basic concepts, we will discuss the detail of TRPO in Part 2. (TRPO is one of the most popular…
RL — Actor-Critic using Kronecker-Factored Trust Region (ACKTR) Explained
In a previous article, we explain how Natural Policy Gradient allows the Policy Gradient methods to converge better by…
RL — The Math behind TRPO & PPO
TRPO Trust Region Policy Optimization & Proximal Policy Optimization PPO are based on the Minorize-Maximization MM…
RL — LQR & iLQR Linear Quadratic Regulator
Reinforcement learning can be divided into Model-free and Model-based learning. Model-free learning emphasizes heavily…
RL — Model-based Reinforcement Learning
In reinforcement learning RL, we maximize the rewards for our actions, which depend on the policy and the system…
RL — Guided Policy Search (GPS)
With Guided Policy Search GPS, a robot learns each skill in the video in 20 minutes. If it is trained by the Policy…
RL — Guided Policy Search (A walkthrough)
In the previous article, we discuss the concept of the Guided Policy Search. Now we look into how it is trained.
RL — Model-Based Learning with Raw Videos
Vision is a critical part of intelligence and the decision-making process. Many toy experiments avoid raw image…
RL — Imitation learning
Imitation is a key part in the human learning. In the high-tech world, if you are not an innovator, you want to be a…
RL — Transfer Learning
We learn from past experiences. We apply learned knowledge to solve new tasks. In Deep Learning, training a deep…
RL — Inverse Reinforcement Learning
It is a major challenge for reinforcement learning (RL) to process sparse and long-delayed rewards. It is difficult to…
RL — Prediction
How can we learn better? This is something we struggle with in real life also. Besides meta-learning, humans make…
RL — PLATO Policy Learning using Adaptive Trajectory Optimization
Imitation plays a major role in learning. In RL, it reduces the amount of time in searching for solutions and it is…
Comparison & Tips
RL — Reinforcement Learning Algorithms Overview
We have examined many Reinforcement Learning (RL) algorithms in this series, for instance, Policy Gradient methods for…
RL — Reinforcement Learning Algorithms Comparison
Choosing an RL algorithm can be confusing. In this article, we will focus on different decision factors in choosing…
Meta-Learning (Learn how to Learn)
People learn continuously. We recall relevant skills and adjust them accordingly in handling new tasks. Overall…
Meta-Learning (Bayesian Meta-Learning & Weak Supervision)
In part 2 of our Meta-Learning article, we will discuss Bayesian Meta-Learning, Unsupervised Learning, and Weak…
RL — Meta-Learning
Many deep learning classifiers demonstrate superhuman performance but human still learns far more efficient than deep…
Neural Turing Machines: a fundamental approach to access memory in deep learning
Memory is a crucial part of the brain and the computer. In some areas of deep learning, we extend the capabilities of…
RL — Reinforcement Learning Algorithms Quick Overview
This article overviews the major algorithms in reinforcement learning. Each algorithm will be explained briefly in a…
RL — Reinforcement Learning Terms
Reinforcement learning observes the environment and takes actions to maximize the rewards. It deals with exploration…
AlphaGo Zero — a game changer. (How it works?)
Even AlphaGo is impressive, it requires bootstrapping the training with human games and knowledge. This is changed when…
AlphaGo: How it works technically?
How does reinforcement learning join force with deep learning to beat the Go master? Since it sounds implausible, the…
RL — Optimization Algorithms
This article contains the optimization algorithms often mentioned in RL.
RL — Importance Sampling
In RL, Importance Sampling estimates the value functions for a policy π with samples collected previously from an older…
RL — Conjugate Gradient
We use the Conjugate Gradient (CG) method to solve a linear equation or to optimize a quadratic equation. It is more…
Credit and references
Reinforcement learning is a huge topic and I owe a lot of debt to many professors, researchers, and bloggers. It is impossible to quote all videos, classes, research papers, and blog that I read. In fact, there are other university courses that help me a lot but I cannot recall the institutes anymore.
For here, I want to list a few that has the biggest impacts on me.
But I want to single out the UC Berkeley Reinforcement Learning course which offers every year for now. I start watching it in 2015. It is a tough course. The lesson on LQR almost made me give up RL. But with some perseverance, that makes the biggest impact on me. I hope it can have the same impact on you too.