Not exactly. I will avoid the words “sliding window” because it means something else for the older technology. At each cell, there are different default boxes, say 6 with different aspect ratios and scales. We are making one prediction relative to each default box. You may ask why not directly relative to the center of the cell instead of the default box. The short answer is it makes the training more stable, at least empirically.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store