Usually, the image is resized before object detection. So there will be some resampling and padding both in training and in inference. Object detectors argument the data with cropping, resize … etc. So the accuracy depends on how well they argument the training data.