The section on Regional Proposal Network should give you a high-level description.

Image for post
Image for post

This is a CNN deep neural network that produces the 256 component vector. The concept is not much different than a CNN classifier with the exception of 256 output values. If this still sounds strange to you, you may want to take a look into CNN first. Not sure, that is the problems you have or not since I don’t know what is your background. But hope this may give you some pointers.

Deep Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store