Thanks for writing this up. Actually, I got the same questions when I look into YOLOv3 because the description in the original paper is a little bit ambiguous. So I will do my best but the best way is to ask the researcher directly.
The concept has some similarity with Faster R-CNN. Some quotes from the Faster R-CNN paper:
“ We assign a positive label to an anchor that has an IoU overlap higher than 0.7 with any ground-truth box. …
We assign a negative label to a non-positive anchor if its IoU ratio is lower than 0.3 for all ground-truth boxes. Anchors that are neither positive nor negative do not contribute to the training objective.”
Faster R-CNN compute the confidence loss from these 2 groups only.
Faster R-CNN can have multiple positive anchors but in YOLOv3, just the top match. So the confusion is from
“ If a bounding box prior is not assigned to a ground truth object it incurs no loss for coordinate or class predictions, only objectness.”
I interpret it as: for the group with low confidence score, the label assigned is 0 and therefore we compute penalty for its confidence score not being 0.