Hopefully, this answer may help.
Q1.
In the context of QC, qubits are a superposition of states. So we want to design an Oracle QC version for f(x) which can take a superposition state and output a superposition. This is not f(x) exactly. We call it oracle function Of, which can present multiple results of f(x) in a superposition state and the output of the superposition state can be mathematically described with f(x). (The difference is very subtle.) When you see the image in your second question, we apply H gates to create superposition before the oracle function.
Q2,
The first qubit contains the information we want already. If we apply the H gate again, it gives a different measurement for different functions.
The First qubit is in superposition. Conceptually, it gives the results for all the parallel computation. We just need a smart way to convert them so their measurement is the same for the same type of function. In this case, an H-gate.
Q3,
In short, a composite state of the system. In the classical world, it just means 2 bits. But in QC, it is much complex for 2 qubits. Longer answer here.
medium.com/@jonathan_hui/qc-what-are-qubits-in-quantum-computing-cdb3cb566595#d204
Why not add up to 1? Kind of drop the normalization 1/root(2) to avoid overcrowding the equation. Thx for bringing this up (modified the description.
Q4,
After that, when we apply an H-gate, it measures |0>