This project is aimed to reproduce the faster rcnn object detection model. It is developed based on the following projects:
-
rbgirshick/py-faster-rcnn, developed based on Pycaffe + Numpy
-
longcw/faster_rcnn_pytorch, developed based on Pytorch + Numpy
-
endernewton/tf-faster-rcnn, developed based on TensorFlow + Numpy
-
ruotianluo/pytorch-faster-rcnn, developed based on Pytorch + TensorFlow + Numpy
However, there are several unique features compared with the above implementations:
-
It is pure Pytorch code. We convert all the numpy implementations to pytorch.
-
It supports trainig batchsize > 1. We revise all the layers, including dataloader, rpn, roi-pooling, etc., to train with multiple images at each iteration.
-
It supports multiple GPUs. We use a multiple GPU wrapper (nn.DataParallel here) to make it flexible to use one or more GPUs, as a merit of the above two features.
-
It is memory efficient. We limit the image aspect ratio, and group the image in batch with similar aspect ratio. We can train resnet101 and VGG16 with batchsize = 4 (4 images) on a sigle Titan X 12 GB. When training with 8 GPU, the maximum batchsize for each GPU is 3 images (Res101), with total batchsize = 24.
-
It is faster. With above merits, our training speed can achieve xxx / xxx (VGG/Res101) on single TITAN X Pascal GPU and xxx/xxx (VGG / Res101) on 8 TITAN X Pascal GPU.
We benchmark our code thoroughly on three datasets: pascal voc, mscoco and imagenet-200, using two different network architecture: vgg16 and resnet101. Below are the results:
-
PASCAL VOC (Train/Test: 07trainval/07test) (lr_decay/max_epoch: 5/7)
model lr GPUs Batch Size Speed / epoch Memory / GPU mAP VGG-16 1e-3 1 Titan X 1 0.46 hr 3265 70.2 VGG-16 3e-3 1 Titan X 4 0.36 hr 9083 0 VGG-16 5e-3 8 Titan X 27 0 0 0 Res-101 1e-3 1 Titan X 1 0.54 hr 3200 MB 73.9 Res-101 3e-3 1 Titan X 4 0.48 hr 9700 MB 0 Res-101 5e-3 8 Titan X 27 0.16 hr 8400 MB 0 -
COCO (Train/Test: coco_train/coco_test) (lr_decay/max_epoch:5/7)
model lr GPUs Batch Size Speed / epoch Memory / GPU mAP VGG-16 1e-3 1 Titan X 1 10.4 hr 0 0 VGG-16 3e-3 1 Titan X 4 8.3 hr 0 0 VGG-16 5e-3 8 Titan X 27 0 0 0 Res-101 1e-3 1 Titan X 1 13.7 hr 3300 MB 0 Res-101 3e-3 1 Titan X 4 11.6 hr 9800 MB 0 Res-101 5e-3 8 Titan X 27 0 8400 MB 0
PASCAL_VOC and COCO:
Please follow the instructions of py-faster-rcnn to setup VOC and COCO datasets. The steps involve downloading data and optionally creating softlinks in the data folder. Since faster RCNN does not rely on pre-computed proposals, it is safe to ignore the steps that setup proposals.
ImageNet:
To train a resnet101, run:
CUDA_VISIBLE_DEVICES=0 python trainval_net.py --dataset pascal_voc --net resnet101
Alternatively, to train a vgg16, run:
CUDA_VISIBLE_DEVICES=0 python trainval_net.py --dataset pascal_voc --net vgg16