Speech Recognition — Maximum Mutual Information Estimation (MMIE)

MLE

Maximum mutual information estimation (MMIE)

Source
Source (Diagonal covariance Gaussian with MLE training (middle) and MMIE training (right)

Lattice-based MMI

Source
Source

Gradient Descent

Minimum phone error (MPE)

HMM/DNN systems

Lattice-Free MMI (LF-MMI)

  • Phone-level language model (LM) instead of the word level (typically using 4-gram phone-level LM).
  • No LM backoff (LM smoothing). LM backoff introduces many states.
  • 30 ms frame rate instead of 10 ms in the feature extractions.
  • Instead of using three states per phone, it uses only one state.
Source

Credits & References

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store