In this speech recognition series, we will cover the basics, like phonetics, and the machine learning models used in speech recognition. Later, we will apply deep learning to speech recognition. In the first article, we understand the core principles behind the speech recognition.
Speech Recognition — Phonetics
Finding the core principle and focus is unexpectedly hard for new inventions. In deep learning (DL), many early efforts…
Like any machine learning (ML) problem, the first challenge will be feature extraction. How vocal information will be extracted and represented?
Speech Recognition — Feature Extraction MFCC & PLP
Machine learning ML extracts features from raw data and creates a dense representation of the content. This forces us…
Before developing models for speech recognition, we study two ML algorithms that frequently used in speech recognition.
Speech Recognition — GMM, HMM
Before the Deep Learning (DL) era for speech recognition, HMM and GMM are two must-learn technology for speech…
Now, let’s start developing acoustic, lexicon and the language model for speech recognition.
Speech Recognition — Acoustic, Lexicon & Language Model
Speech recognition can be viewed as finding the best sequence of words (W) according to the acoustic, the pronunciation…
The next two articles develop models and methods to transcript an audio recording.
Speech Recognition — ASR Decoding
With the acoustic, pronunciation lexicon and language model built and discussed in the previous article, we are ready…
This will involve the development of a state machine.
Speech Recognition — Weighted Finite-State Transducers (WFST)
Previously, we developed all the necessary Lego blocks in modeling our ASR problem. They include the HMM models for the…
Next, we detail how these models are trained.
Speech Recognition — ASR Model Training
Now, we come to the last part of the puzzle in training an ASR. In this article, we will dig deeper to learn how to…
To make the discussion concrete, we will use the Kaldi platform to demonstrate a training process.
Speech Recognition — Kaldi
Kaldi is a toolkit for speech recognition targeted for researchers. We can use Kaldi to train speech recognition models…
Finally, we will move into the deep learning era and apply its technology to solve the speech recognition problem.
Speech Recognition — Deep Speech, CTC, Listen, Attend, and Spell
Deep Learning (DL) changes many Machine Learning (ML) fields that heavily depend on domain knowledge. Decades of…