What do we learn from single shot object detectors (SSD, YOLOv3), FPN & Focal loss (RetinaNet)?

Single Shot detectors

Faster R-CNN flow
feature_maps = process(image)
ROIs = region_proposal(feature_maps)
for ROI in ROIs
patch = roi_align(feature_maps, ROI)
results = detector2(patch) # Reduce the amount of work here!
feature_maps = process(image)
results = detector3(feature_maps) # No more separate step for ROIs
Making a prediction relative to a sliding window.
64 locations
Use 4 anchors to make 4 predictions per location.
4 predictions each relative to an anchor
Compute a prediction using a 3x3 convolution filter.
Each location makes k predictions each have 25 parameters.

SSD

Single shot prediction for both classification and location.
Use multi-scale feature maps for detection.
Source
Object detection in real-time
Source
Source
Source
Source

Feature Pyramid Networks (FPN)

FPN (Source)
Feature extraction in FPN (Modified from source)
Modified from source
Reconstruct spatial resolution in the top-down pathway. (Modified from source)
Add skip connections (Source)

Hard example mining

Non-maximal suppression in inference

Focal loss (RetinaNet)

Modified from source.
RetinaNet

Further reading on SSD, YOLO & FPN

Resources

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store