Self-driving car object tracking: Intuition and the math behind Kalman Filter

12 min readApr 8, 2018

Three major components of autonomous driving are localization (where am I), perception (what is around me), and control (how to drive around). We are dealing with sensors to perceive what is around us. Those sensors offer many benefits but yet suffer some significant drawback. Cameras offer high resolution but cannot detect speed and distance easily. LiDAR creates a nice 3-D map. But LiDAR is huge and expensive. Also, both camera and LiDAR are vulnerable to bad weather and environmental factors. RADAR works better in bad conditions and detects speed well. But its resolution is not high enough for precise maneuver. When we are dealing with self-driving car sensors, we do not have a one-size-fits-all solution. Kalman filter is designed to fuse sensor readings to make more accurate predictions than each individual sensor alone. However, the math in Kalman filter can be un-necessary overwhelming. We will cover the intuition first followed by explanations on the math to make it easier.

This is part of a 5-series self-driving. Other articles includes

Self-driving perception: Sensor fusion with Kalman Filter.
Self-driving perception: Extended Kalman Filter and Unscented Kalman Filter.
Self-driving localization: Localization with Particle Filter.
Self-driving control: Control with Model Predictive Control & PID.
Self-driving Path finding.

Intuition

We are taking a train to work. The train comes every 15 minutes starting at 5 am. One day, you ask a fellow passenger when the next train arrives. He answers 7:17 am. Before settling down with his answer, let’s dig down a little bit more. If this is the Japan Shinkansen bullet train, the train schedule is God and delays longer than 30 seconds are rare. The x-axis below represents the train arrival time (15 means 7:15 am) and the y-axis is the probability for each possible arrival time. The graph below indicates a strong certainty that the train will arrive at 7:15.

PDF for the train arrival time calculated from the schedule.

In the following graph, we add another curve modeling the passenger answer. In this case, the information is less certain and therefore, the curve is flatter.

Add the probability for the passenger’s answer.

In many transport systems, the schedule is just a reference with frequent delays. Now, if the passenger just checks the real-time arrival time on the phone, the information is much more certain and the plot will just like the opposite of our previous plot.

When the passenger is more reliable than the train schedule.

Instead of blindly trusting the train schedule or the passenger, we want a better approach in making accurate predictions. Let’s introduce a more realistic situation in predicting the location of a moving car. At a certain time, we believe our car is 2 meters away from the stop sign. To handle uncertainty, we use a stochastic model to describe our car location. The red curve below indicates the probabilities of finding the car at different locations. This idea may take a minute to sink in since we are so used to deterministic models.

To recalibrate our location, we take a GPS measurement. But remember, the measurement is noisy so we use a stochastic model to describe it also. Below are the two different probability curves. The left is our belief and the right one is the GPS measurement. Which one should we trust?

Let’s multiply both curves together and renormalizing it to make the total probability equal to 1.

The orange curve below is our new location prediction combining our belief and the measurement. It has a peak at 2.6m where both red curves agree the most. The resulting curve is also sharper, i.e. we are more certain about the location now. The basic idea for Kalman Filter is that simple even it will take a while to explain the details. In particular, multiple probability curves are computationally intense. We need a more efficient approach to merge probability curves.

We can also view this approach as a weighted sum between the belief and the measurements. Let’s revisit our train examples again. If our belief is strong (the Japan bullet train’s schedule) while the measurements are weak, the final prediction (the black curve below) will resemble the probability curve of our belief. On the contrary, if the measurement is accurate but we are not certain about our belief, the final prediction will resemble the measurements.

How can we get a more accurate result with two less accurate information? In real-life, the measurements may have errors but their probability curves are accurate. For example, many measurement errors are Gaussian distributed with variances that can be determined by experiments. Therefore, we can derive our probability curves with good precision. By multiplying the probability curves, we locate where predictions agree and therefore reinforce the final predictions.

By overlapping belief and measurements, we reinforce what we agree.

So let’s go through the process end-to-end. We start with an initial GPS reading. Since the measurement errors are Gaussian distributed, we can use it to build a probability curve (a probability distribution function PDF) for the car location. This is our belief. Then we use a dynamic model to predict where the car may be next. The most simple one will be location = previous location + velocity × 𝛿t. Next, we take a measurement and develop a PDF for it. We merge both PDFs to make a final prediction. When Kalman filter is explained as a Bayes filter, the belief is also called prior and the final prediction is called posterior.

To track a moving car, we repeat a 2-step procedure:

Predict: Apply a dynamic model to our belief to predict what is next.
Update: Take a measurement to update our prediction.

Model

Before we introduce the Kalman Filter, we need to detail the dynamic model in predicting motion. The equations will look scary but actually pretty simple. To simplify our illustration, we assume our car moves along the x-axis only.

State

We first define the state of our car. For simplicity, we will use the position p and the speed v only.

State-transition model

Without acceleration, we can calculate the equation of motion with some simple Physics.

Rewrite this into a matrix form which derives states from the last previous state.

A, a matrix, becomes our state-transition model. In our example, A is:

Input controls

We have many controls on the car. For example, we control the throttle for acceleration. Let’s modify our equations:

We pack all input controls into a vector u and the matrix form of motion becomes:

where, in our example,

(Since we have only one control for now, u has only one element a.)

To make our model more accurate, we add another term called process noise.

We use process noise w to represent random factors that are difficult to model precisely, for example, the wind effect and the road condition. We assume w is Gaussian distributed with a zero mean and covariance matrix Q. (Variance is for 1-D Gaussian distribution while covariance matrix is for n-D Gaussian distribution)

Observer model (measurement model)

We also model our measurements by applying a transformation matrix C on the state of the system.

Very often, we measure the state directly (for example, the car location). Hence, C is often an identity matrix. In some case, we do not measure it directly. We need a matrix C to describe the relationship between the state and the measurement. In other cases, C performs a coordinate transformation. For example, the state of a RADAR is stored in polar coordinates. We use C to convert it to Cartesian.

Putting it together

Here we have a dynamic model to predict a state and a measurement from its last previous state.

Kalman Filter

Real world

In the real world, we know the input control u and the measurement y. Through dynamic mechanic in Physics or experiments, it is not too hard to find A, B, and C.

Observer world

Now, we use this information to build a model in an observer world to resemble the real world. In the observer world, we calculate the estimated measurement ŷ with the following equations:

Symbols with a hat, like ŷ, mean values estimated in the observer world.

Here is the visualization of the observer world.

ŷ is what the car model’s estimation on the measurement. We know that ŷ will be off since our car model does not include process noise like the wind. By knowing the error (y-ŷ) between the measurement and the measurement estimate, we can refine the car model to make a better estimate for x. We just need to introduce a Kalman gain K to map the error in our measurement estimate to the error in our state estimate. Then our new x estimate will simply the old estimate plus its error. In short, we use the error in our measurement estimate to make an adjustment to the state estimate.

Now our model involves 2 steps. The first step is the prediction:

The second step is the update of our estimated state with the error (y-ŷ):

Since we break the state estimation into 2 steps, there is an estimated state before the update and one after the update. To reduce confusion, we introduce the notation with a minus sign to indicate the estimated state before the update. Now our car model becomes:

Let’s take a quick peek at how K is calculated,

where R quantifies the measurement noise. Let’s do an insanity check. In our location example, C is an identity matrix. If there is no noise, K becomes an identity matrix. Using our car model equation, it will output the measurement y as the estimated state. That is, when the measurement is 100% accurate, our prediction should equal the measurement.

Now, we create a car model in the observed world that take into the account of the measurement noise in the form of Kalman gain.

Prediction

So far, our observer world uses a deterministic model. Let’s expand it to stochastic. We assume all estimated states are Gaussian distributed. So we model our last estimated state with a mean x and a covariance matrix P.

Then, we apply the car model in the observer world to make a new state estimate.

Back to linear algebra, if we apply a transformation matrix A to an input x with a covariance matrix ∑, the covariance matrix of the output (Ax) is simply:

Putting the process noise back, the Gaussian model for the estimated state before the update is:

where,

Update

Finally, we will make the final prediction using Kalman filter. At time=k, we make a measurement y. For easier visualization, we always assume C is an identity matrix when we plot the measurement PDF in this article.

Since the estimated state (before the update) and the measurement are Gaussian distributed, the final estimated state is also Gaussian distributed. We can apply linear algebra again to compute the Gaussian model of our final prediction.

First, we calculate the Kalman gain which put the measurement noise back into the equation and map the error in the measurement estimate to the state estimate error.

Then the Gaussian model for the new estimated state is calculated based on the Gaussian model for the state estimate before the update, the Kalman gain K, the measurement and C. Here is our updated state estimation:

where

Congratulation! You survive the tedious notations and this is how we use Kalman filter to make better state estimation. Comparing with our previous explanation, we do not multiple curves together. Kalman Filter uses simple linear algebra and is much simpler.

Recap

Let’s do a recap.

The red curve on the left: the estimated state at time=k-1.
The red curve on the right: the estimated state before the update.
The orange curve: the measurement.
The black curve: the estimated state at time=k.

The diagram below shows the corresponding mean and the covariance matrix.

PDFs modeled with Gaussian distributions.

To track a moving car, we repeat a 2-step procedure below:

Sensor fusion

LiDAR fires rapid pulses of laser light (200K per second) to create a 3-D map of our environment. Its short wavelength lets us detect small objects with high resolutions. However, the measurement can be noisy in particular during rain, snow, and smog. Radar has longer range and is more reliable, but it has lower resolution. Sensor fusion combines measurements from different sensors using Kalman filter to improve accuracy. The measurement errors of many sensors are not co-related, i.e. the measurement error of a sensor is not caused by another sensor. For that, we can apply Kalman filter one at a time for each measurement to refine the prediction.

More thoughts

We use linear algebra to model our car. i.e. A, B and C are simply matrix. It may not always true in the real world. For next article, we will talk about Extended Kalman Filter and Unscented Kalman Filter to overcome this problem.