This project contains the implementation of our CVPR 2019 paper arxiv.
Stereo R-CNN focuses on accurate 3D object detection and estimation using image-only data in autonomous driving scenarios. It features simultaneous object detection and association for stereo images, 3D box estimation using the 2D information, accurate dense alignment for 3D box refinement. This branch is a light-weight version based on the monocular 2D detection, which only uses stereo images in the dense alignment module while has almost comparable performance with the full version. For the full version Stereo R-CNN, please checkout to branch master.
Authors: Peiliang Li, Xiaozhi Chen and Shaojie Shen from the HKUST Aerial Robotics Group, and DJI.
If you find the project useful for your research, please cite:
@inproceedings{licvpr2019,
title = {Stereo R-CNN based 3D Object Detection for Autonomous Driving},
author = {Li, Peiliang and Chen, Xiaozhi and Shen, Shaojie},
booktitle = {CVPR},
year = {2019}
}
This implementation is tested under Pytorch 0.3.0. To avoid affecting your Pytorch version, we recommend using conda to enable multiple versions of Pytorch. Please ignore below steps if you have already run them following the master branch.
0.0. Install Pytorch:
conda create -n env_stereo python=2.7
conda activate env_stereo
conda install pytorch=0.3.0 cuda80 -c pytorch
conda install torchvision -c pytorch
0.1. Other dependencies:
git clone git@github.com:HKUST-Aerial-Robotics/Stereo-RCNN.git
cd stereo_rcnn
git checkout mono
pip install -r requirements.txt
0.2. Build:
cd lib
sh make.sh
cd ..
1.0. Set the folder for placing the model
mkdir models_mono
1.1. Download our trained weight One Drive/Google Drive and put it into models_mono/, then just run
python demo.py
If everything goes well, you will see the detection result on the left, right and bird's eye view image respectively.
2.0. Download the left image, right image, calibration, labels and point clouds (optional for visualization) from KITTI Object Benchmark. Make sure the structure looks like:
yourfolder/object
training
image_2
image_3
label_2
calib
velodyne
2.1. Create symlinks:
cd data/kitti
ln -s yourfolder/object object
cd ../..
Download the Res-101 pretrained weight One Drive/Google Drive, and put it into data/pretrained_model
Set corresponding CUDA_VISIBLE_DEVICES in train.sh, and simply run
./train.sh
The trained model and training log are saved in /models_mono by default.
You can evaluate the 3D detection performance using either our provided model or your trained model. Set corresponding CUDA_VISIBLE_DEVICES in test.sh, and run
./test.sh
The results are saved in models_mono/result/data by default. You can evaluate the results using the tool from here.
Some sample results:
This repo is built based on the Faster RCNN implementation from faster-rcnn.pytorch and fpn.pytorch, and we also use the imagenet pretrained weight (originally provided from here) for initialization.
The source code is released under GPLv3 license.
We are still working on improving the code reliability. For any technical issues, please contact Peiliang Li <pliapATconnect.ust.hk>.
For commercial inquiries, please contact Shaojie Shen <eeshaojieATust.hk>.