Speech Recognition — Maximum Mutual Information Estimation (MMIE)


Maximum mutual information estimation (MMIE)

Source (Diagonal covariance Gaussian with MLE training (middle) and MMIE training (right)

Lattice-based MMI


Gradient Descent

Minimum phone error (MPE)

HMM/DNN systems

Lattice-Free MMI (LF-MMI)

  • Phone-level language model (LM) instead of the word level (typically using 4-gram phone-level LM).
  • No LM backoff (LM smoothing). LM backoff introduces many states.
  • 30 ms frame rate instead of 10 ms in the feature extractions.
  • Instead of using three states per phone, it uses only one state.

Credits & References



