- See the comment from Richard before on question 1.
- Usually, this is the experimental result. For this, it is based on computation speed.
- Apply 1x1 convolution is very common in creating a more complex mathematical model (a deeper network) rather than applying an 3xs convolution filter again which is more expensive.