Name		Name	Last commit message	Last commit date
Latest commit History 274 Commits
cfgs		cfgs
lib		lib
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_init_paths.py		_init_paths.py
demo.py		demo.py
test_net.py		test_net.py
trainval_net.py		trainval_net.py

Repository files navigation

A Pytorch Faster Faster R-CNN Implementation

Introduction

This project is a faster faster R-CNN implementation, aimed to accelerating the training of faster R-CNN object detection models. Recently, there are a number of good implementations:

rbgirshick/py-faster-rcnn, developed based on Pycaffe + Numpy
longcw/faster_rcnn_pytorch, developed based on Pytorch + Numpy
endernewton/tf-faster-rcnn, developed based on TensorFlow + Numpy
ruotianluo/pytorch-faster-rcnn, developed based on Pytorch + TensorFlow + Numpy

During our implementing, we referred the above implementations, especailly longcw/faster_rcnn_pytorch. However, our implementation has several unique and new features compared with the above implementations:

It is pure Pytorch code. We convert all the numpy implementations to pytorch.
It supports trainig batchsize > 1. We revise all the layers, including dataloader, rpn, roi-pooling, etc., to train with multiple images at each iteration.
It supports multiple GPUs. We use a multiple GPU wrapper (nn.DataParallel here) to make it flexible to use one or more GPUs, as a merit of the above two features.
It is memory efficient. We limit the image aspect ratio, and group the image in batch with similar aspect ratio. We can train resnet101 and VGG16 with batchsize = 4 (4 images) on a sigle Titan X 12 GB. When training with 8 GPU, the maximum batchsize for each GPU is 3 images (Res101), with total batchsize = 24.
It is fast. With above merits, the training is fast. We report the training speed on NVIDIA TITAN Xp in the tables below.

Benchmarking

We benchmark our code thoroughly on three datasets: pascal voc, mscoco and imagenet-200, using two different network architecture: vgg16 and resnet101. Below are the results:

1). PASCAL VOC 2007 (Train/Test: 07trainval/07test) (lr_decay/max_epoch: 5/7)

model	lr	GPUs	Batch Size	Speed / epoch	Memory / GPU	mAP
VGG-16	1e-3	1 Titan X	1	0.46 hr	~3265MB	70.2
VGG-16	3e-3	1 Titan X	4	0.36 hr	~9083MB	N/A
VGG-16	5e-3	8 Titan X	24	0.24 hr	~11303MB	N/A
Res-101	1e-3	1 Titan X	1	0.54 hr	~3200 MB	73.9
Res-101	3e-3	1 Titan X	4	0.48 hr	~9700 MB	N/A
Res-101	5e-3	8 Titan X	24	0.16 hr	~8400 MB	N/A

2). COCO (Train/Test: coco_train/coco_test) (lr_decay/max_epoch:5/7)

model	lr	GPUs	Batch Size	Speed / epoch	Memory / GPU	mAP
VGG-16	1e-3	1 Titan X	1	10.4 hr	N/A	N/A
VGG-16	3e-3	1 Titan X	4	8.3 hr	N/A	N/A
VGG-16	5e-3	8 Titan X	24	N/A	N/A	N/A
Res-101	1e-3	1 Titan X	1	13.7 hr	~3300 MB	N/A
Res-101	3e-3	1 Titan X	4	11.6 hr	~9800 MB	N/A
Res-101	5e-3	8 Titan X	24	N/A	~8400 MB	N/A

NOTE. N/A means not available now. The benchmarking performance on these datasets will come along with our report soon.

Preparation

First of all, create a folder:

mkdir data

Data Preparation

PASCAL_VOC 07+12: Please follow the instructions in py-faster-rcnn to prepare VOC datasets. Actually, you can refer to any others. After downloading the data, creat softlinks in the folder data/.
COCO: Please also follow the instructions in py-faster-rcnn to prepare the data.
Visual Genome: Please follow the instructions in bottom-up-attention to prepare Visual Genome dataset. You need to download the images and object annotation files first, and then perform proprecessing to obtain the vocabulary and cleansed annotations based on the scripts provided in this repository.

Pretrained Model

We used two pretrained models in our experiments, VGG and ResNet101. You can download these two models from:

Download them and put them into the data/.

NOTE. We compare the pretrained models from Pytorch and Caffe, and surprisingly find Caffe pretrained models have slightly better performance than Pytorch pretrained. We would suggest to use Caffe pretrained models from the above link to reproduce our results.

Compilation

Compile the dependencies using following simple commands:

cd lib
sh make.sh

Train

To train a faster R-CNN model with vgg16 on pascal_voc, simply run:

CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py --dataset pascal_voc --net vgg16 --cuda --bs $BATCH_SIZE

where 'bs' is the batch size with default 1. Alternatively, to train with resnet101 on pascal_voc, simple run:

 CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net.py --dataset pascal_voc --net resnet101 --cuda --bs $BATCH_SIZE

Above, BATCH_SIZE can be set adaptively according to your GPU memory size. On Titan Xp with 12G memory, it can be up to 4.

If you have multiple (say 8) Titan Xp GPUs, then just use them all! Try:

python trainval_net.py --dataset pascal_voc --net vgg16 --cuda --mGPUs --bs 24

Change dataset to "coco" or 'vg' if you want to train on COCO or Visual Genome.

Test

If you want to evlauate the detection performance of a pre-trained vgg16 model on pascal_voc test set, simply run

python test_net.py --dataset pascal_voc --net vgg16 --checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT --cuda

Specify the specific model session, chechepoch and checkpoint, e.g., SESSION=1, EPOCH=6, CHECKPOINT=416.

Authorship

This project is equally contributed by Jianwei Yang and Jiasen Lu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Pytorch Faster Faster R-CNN Implementation

Introduction

Benchmarking

Preparation

Data Preparation

Pretrained Model

Compilation

Train

Test

Authorship

About

Releases

Packages

Languages

License

YangXS/faster-rcnn.pytorch

Folders and files

Latest commit

History

Repository files navigation

A Pytorch Faster Faster R-CNN Implementation

Introduction

Benchmarking

Preparation

Data Preparation

Pretrained Model

Compilation

Train

Test

Authorship

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages