The official reason coming from the paper is :
“Our method introduces small extra cost by the extra layers in the FPN, but has a lighter weight head. Overall our system is faster than the ResNet-based Faster R-CNN counterpart”
But there are always other reasons that caused by the implementation details. What FPN paper wants to demonstrate is that those extra layers does not hurt in practice.