I believe the ROI is not done on each sub-layers to avoid performance overhead.
The paper does not specify the downsampling techniques. So it can be something not experimented or not significant. I try not to comment on any specific implementations otherwise I will open a flood of implementation questions. But it is something you can experiment if you feel it can make a difference.