Understanding Region-based Fully Convolutional Networks (R-FCN) for object detection

Image for post
Image for post

Intuition

Image for post
Image for post
Picture modified from “Woman with head wrapped in a scarf smiling” by Roksolana Zasiadko.
Image for post
Image for post
By knowing where the right eye is, we know where a face should be.

Motivations

feature_maps = process(image)
ROIs = region_proposal(feature_maps)
for ROI in ROIs
patch = roi_pooling(feature_maps, ROI)
class_scores, box = detector(patch)
class_probabilities = softmax(class_scores)
feature_maps = process(image)
ROIs = region_proposal(feature_maps)
score_maps = compute_score_map(feature_maps)
for ROI in ROIs
V = region_roi_pool(score_maps, ROI)
class_scores, box = average(V) # Much simpler!
class_probabilities = softmax(class_scores)

R-FCN

Image for post
Image for post
Create a new feature map from the left to detect the top left corner of an object.
Image for post
Image for post
Generate 9 score maps
Image for post
Image for post
Apply ROI onto the feature maps to output a 3 x 3 array.
Image for post
Image for post
Overlay a portion of the ROI onto the corresponding score map to calculate V[i][j]
Image for post
Image for post
ROI pool
Image for post
Image for post
Source
Image for post
Image for post
Image for post
Image for post
Network flow for R-FCN

Boundary box regression

Results

Image for post
Image for post
R-FCN is 20x faster. (Source)

Credit & reference

Written by

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store