Improve Deep Learning Models performance & deep network tuning (Part 6)

  • Analyze errors (bad predictions) in the validation dataset.
  • Monitor the activations. Consider batch or layer normalization if it is not zero centered or Normal distributed.
  • Monitor the percentage of dead nodes.
  • Apply gradient clipping (in particular NLP) to control exploding gradients.
  • Shuffle dataset (manually or programmatically).
  • Balance the dataset (Each class has the similar amount of samples).

Tuning

Source Emil Wallner
  • If the ratio is > 1e-3, consider lowering the learning rate.
  • If the ratio is < 1e-3, consider increasing the learning rate.
  • Mini-batch size
  • Learning rate
  • Regularization factors
  • Layer-specific hyperparameters (like dropout)
  • Sparsity
  • Activation functions
  • Learning rate decay schedule
  • Momentum
  • Early stopping
source

Grid search for hyperparameters

  • (e-1, e-2, … and e-8) and,
  • (e-3, e-4, … and e-6).

Model ensembles

  • one vote per model,
  • weighted votes based on the confidence level of its prediction.

Model improvement

source

Kaggle

Experiment framework

Conclusion

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store