Let me oversimplified this. You get 9 maps. For the first map and its location [2,2], it detects whether location[1, 1] of the input feature is the top-left of [2, 2]. For the second map at [2, 2], it detects whether location[1, 2] of the input feature is the top-center of [2, 2].