As often, I answer questions when I just wake up. So I am not sure my answer may make sense to you.
ROI applies to a region of the feature map (a sub-window). We use ROI-pooling to convert the features in this sub-region to a specific dimension so we can process it with a Fully-connected network regardless of the shape and size of the ROI. (since FC assume a specific dimension for the input)