Section 3.3 in https://arxiv.org/pdf/1801.01973.pdf shows that IS formula we used is equal to the mutual information which matches the objective we want on the entropy. This is more like an answer on why KL-divergence.
Pretrained Inception-v3 model with ImageNet data. One major purpose for IS/FID is to compare algorithms. For that purpose, we define what Net and what data to compare algorithms. That avoids cherry picking datasets to claim you have the state-of-the-art algorithm. If you just to monitor the performance of your GAN during training, FID may be a better fit because it compares features which is quite transferable from ImageNet to other domains.