Speech Recognition — Deep Speech, CTC, Listen, Attend, and Spell

Source
Source
Source
Source
Source

Feature Extraction

Hinton et al (2012)
Source
Peddinti (2015)
Modified from source 1 & 2
Source
Source
Source
Source

Connectionist temporal classification (CTC)

Source
Modified from source
Source
Source
Source
Source
  • stay with the same character for the next time step to the pink node,
  • transit to the next character ε in Z — the green node, or
  • skip the next character ε and transit to the purple node.
Case 1

Deep Speech

Modified from source
Modified from source

RNN Transducer

Source
Source

Attention

Listen, Attend and Spell (Encoder-Decoder)

Modified from source
Source

Credits & References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store