The default box is even determined before training. During training, the network extracts features and train to learn the boundary box relative to the default box. After the training, the network should be able to create this mapping based on the features. So in inference, the network extracts features and predict the boundary box. About the default box, it can be determined by you to have a different aspect ratio for objects that you expected. There are other methods but that is the essence of it.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store