There is only one prediction per default box. Each spatial location (say a total of kxk for that layer) has 6 default box, for example, there is at most kxkx6 predictions for that layer.

For localization loss, we count positive match only.

But for confidence loss (classification error), we count both positive and negative.

To maintain class balance, we maintain a certain ratio between the positive and negative examples. For predictions that should not contain an object, we sort them by the confidence score for class 0 and use those having the lowest score for training.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store