I believe the ROI is not done on each sub-layers to avoid performance overhead.

The paper does not specify the downsampling techniques. So it can be something not experimented or not significant. I try not to comment on any specific implementations otherwise I will open a flood of implementation questions. But it is something you can experiment if you feel it can make a difference.

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store