1. That is changed in V2. Every guess can output a different class.
  2. The layers used for object detection has low spatial resolution and too hard to locate small objects. Then, you may ask why not using higher resolution layer. This will give you some answers:

medium.com/@jonathan_hui/understanding-feature-pyramid-networks-for-object-detection-fpn-45b227b9106c

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store