Speech Recognition Series

Jonathan Hui
2 min readDec 20, 2019

In this speech recognition series, we will cover the basics, like phonetics, and the machine learning models used in speech recognition. Later, we will apply deep learning to speech recognition. In the first article, we understand the core principles behind the speech recognition.

Like any machine learning (ML) problem, the first challenge will be feature extraction. How vocal information will be extracted and represented?

Before developing models for speech recognition, we study two ML algorithms that frequently used in speech recognition.

Now, let’s start developing acoustic, lexicon and the language model for speech recognition.

The next two articles develop models and methods to transcript an audio recording.

This will involve the development of a state machine.

Next, we detail how these models are trained.

To make the discussion concrete, we will use the Kaldi platform to demonstrate a training process.

Finally, we will move into the deep learning era and apply its technology to solve the speech recognition problem.