Mask R-CNN image cropping using bounding box

Moaz Mohammed Husain
3 min readOct 29, 2020

Mask R-CNN was released around in 2017. It is an extension of Faster R-CNN. Mask R-CNN is mainly used for instance segmentation, and object detection tasks.

Through this post, I will show you the exact steps to crop an image using the bounding box found in Mask R-CNN. I will also show the steps to delete the background of the image. I am assuming that the reader of this post is familiar with the basics of Python.

Installing dependencies

You will have to install Cython, COCO Dataset and Mask R-CNN by Matterplot.

!pip install Cython!git clone https://github.com/waleedka/coco
!pip install -U setuptools
!pip install -U wheel
!make install -C coco/PythonAPI
!git clone https://github.com/matterport/Mask_RCNN
cd ./Mask_RCNN
!wget https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5

I have checked the Mask R-CNN by matterplot, and it only works with tensorflow 1.13.1 and keras 2.1.0, so you will have to install these as well.

pip install tensorflow==1.13.1
pip install keras==2.1.0

Importing libraries, configuring the model and loading the weights

You can refer to the official example on matterplots repo to do the same. Click here to view the file. Copy the first 4 cells from the Jupyter notebook.

Visualizing the segmentation

First, you will have to import your image file. I am using a pizza image for this example, and using OpenCV to import the image. The default channel when imported using OpenCV are in BGR orientation. Convert it to RGB, use the below code,

cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

Now to view the results of segmentation, use the below code,

# Run detection
results = model.detect([img_cvt], verbose=1)

# Visualize results
r = results[0]
visualize.display_instances(img_cvt, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])

This is how the output looks like,

segmentation output

In the above code r[‘rois’] stores the bounding box co-ordinates. I will use this to crop the image. This is what the data of r[‘rois’] looks like,

bounding box co-ordinates

The co-ordinates are in the following format: x, y, width, height.

Cropping the image using bounding box

We have everything we want to generate the bounding box part of the image. To begin I will first grab all the individual co-ordinates using indexing. This is the code,

x = r[‘rois’][0][0]
y = r[‘rois’][0][1]
width = r[‘rois’][0][2]
height = r[‘rois’][0][3]

Now since the image is in NumPy array format, I will use slicing to grab the bounding box part of the image. This is the code,

crop_img = img_cvt[x:width, y:height]

This is what the output looks like,

cropped image

Now you can use this for many things. You can first generate an image with its background removed and then crop it using bounding box. You can also crop an entire dataset. All you have to do is iterate over the dataset using a for loop and generate the bounding box part of the image. You can find the code for this part in my GitHub repo. This is the link.

--

--