Sitemap
1 min readAug 11, 2022

--

Very thoughtful questions.

1. In your first question, you assume the weights should not be normal distributed to learn real life problem. This is a strong assumption, in particular, the central limit theom gives hints on why normal distribution is so important. Can you initialize the parameter with random linear function? Yes, you can. So it comes back to the domain of the problem - which distribution can give you a more stable training. So let's assume the weights can have any distribtions. In turns out normal distribution for the initial weights is a good start and that does not imply the final weights are necessary normal distributed though.

2. Actication function creates information bottleneck which is necessary for ML. Applying noramalization before that creates a nice shape for the output such that it works better with the typical learning parameter strategy. That is the reasoning I can think of. But can the normalization after the activation function? I will say never say no in ML. So you can try it. I just give you a possible reason if it does not work.

--

--

No responses yet