Deformable convolution can replace any traditional convolution layer. So is that doable? Yes. But it triples (its output channels are 2x + the original) the amount of work for the layer(s) you are replacing. So need to verify the speed vs mAP improvement. It may not be too bad since in YOLOv3, you may replace only the 3 layers that used for object detections. (There are 50+ layers in YOLOv3.)