When Redmon details the implementation later, he wrote:

As discussed above, after our initial training on images at 224 × 224 we fine tune our network at a larger size, 448. For this fine tuning we train with the above parameters but for only 10 epochs and starting at a learning rate of 10−3 . At this higher resolution our network achieves a top-1 accuracy of 76.5% and a top-5 accuracy of 93.3%.

So this is slightly different from his claim early in the paper. But this is the implementation detail which may not be very significant in understand the idea. But Redmon is very good at detailing his improvement which I really appreciate. Many other paper is sometimes very hard to know their implementation details to replicate the result.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store