RL — Exploration with Deep Learning

Regret

Modified from source

Optimistic exploration with UCB1

Source
Source
Modified from source
Source

Bayesian UCB

Thompson sampling (Posterior sampling)

Source

Information Gain

Modified from source (z is the reward, y is the observation and a is the action)
Modified from source

Recap

  • Optimistic exploration.
  • Thompson sampling (posterior sampling).
  • Information gain.

Contextual Bandits

Exploration in RL

Source

Exploring with pseudo-counts

Source
Source
Source

Counting with Hash

Source

Exemplar models

Modified from source
Source
Source
Source

Exploration by random network distillation

Source

Posterior sampling in deep RL

Source
Source

Bootstrapped DQN

Source
Source

Information Gain in RL

Modified from source
Source

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store