First, discriminator usually wins, at least in the beginning. And at that time, the gradient is in the flat area below (red line) and gradient diminishes.

For more details, here is another article.
The second question. We cannot just add noise to the generated image without adding noise to real images also. Otherwise, the discriminator can detect noise to distinguish them. So we will add noise to both of them before the discriminator. We can simply say adding noise before the discriminator also.
Arjovsky argues all of them mathematically. I try to explain that more intuitively. But follow the original paper if people have doubts.