Official code for the paper "CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition"
This repository includes the pre-trained models, evaluation and training codes for pre-training, zero-shot, and fine-tuning experiments of CG3D. It is built on the Point-BERT codebase. Please see the end of this document for a full list of code references.
The known working environment configuration is
python 3.9
pytorch 1.12
CUDA 11.6
- Install the conda virtual environment using the provided .yml file.
conda env create -f environment.yml
(OR)
-
Install dependencies manually.
conda create -n cg3d conda activate cg3d
pip install -r requirements.txt
conda install -c anaconda scikit-image scikit-learn scipy
pip install git+https://github.com/openai/CLIP.git
pip install --upgrade https://github.com/unlimblue/KNN_CUDA/releases/download/0.2/KNN_CUDA-0.2-py3-none-any.whl
cd ./extensions/chamfer_dist python setup.py develop
-
Build modified timm from scratch
extract modified timm package here and place in ./models/SLIP/
cd ./models/SLIP/pytorch-image-models pip install -e .
-
Install PointNet ops
cd third_party/Pointnet2_PyTorch pip install -e . pip install pointnet2_ops_lib/.
-
Install PyGeM
cd third_party/PyGeM
python setup.py install
- Download point cloud datasets for pre-training and fine-tuning.
-
Download ShapeNetCore v2.
-
Download ModelNet
-
Download ScanObjectNN
Save and unzip the above datasets.
- Render views of textured CAD models of ShapeNet using this repository such that the data is organized as
├── data (this may be wherever you choose)
│ ├── modelnet40_normal_resampled
│ │ │── modelnet10/40_shape_names.txt
│ │ │── modelnet10/40_train/test.txt
│ │ │── airplane
│ │ │── ...
│ │ │── laptop
│ ├── ShapeNet55
│ │ │── train.txt
│ │ │── test.txt
│ │ │── shapenet_pc
│ │ │ |── 03211117-62ac1e4559205e24f9702e673573a443.npy
│ │ │ |── ...
│ ├── shapenet_render
│ │ │── train_img.txt
│ │ │── val_img.txt
│ │ │── shape_names.txt
│ │ │── taxonomy.json
│ │ │── camera
│ │ │── img
│ │ │ |── 02691156
│ │ │ |── ...
│ ├── ScanObjectNN
│ │ │── main_split
│ │ │── ...
Download SLIP model weights from here.
No. of points | Model file | Task | Configuration file |
---|---|---|---|
1024 | download | Pre-training | link |
8192 | download | Pre-training | link |
No. of points | Model file | Task | Configuration file |
---|---|---|---|
1024 | download | Pre-training | link |
8192 | download | Pre-training | link |
-
Change data paths to relevant locations in
cfgs/dataset_configs/
-
Pre-train PointTransformer on ShapeNet under the CG3D framework:
python main.py --exp_name {NAME FOR EXPT} --config cfgs/ShapeNet55_models/PointTransformerVPT.yaml --pretrain --out_dir {OUTPUT DIR PATH} --text --image --clip --VL SLIP --visual_prompting --npoints 1024 --slip_model {PATH TO SLIP MODEL}
-
Pre-train PointMLP on ShapeNet under the CG3D framework:
python main.py --exp_name {NAME FOR EXPT} --config cfgs/ShapeNet55_models/PointMLP_VPT.yaml --pretrain --out_dir {OUTPUT DIR PATH} --text --image --clip --VL SLIP --visual_prompting --npoints 1024 --slip_model {PATH TO SLIP MODEL}
python eval.py --config {CONFIG} --exp_name {NAME FOR EXPT} --ckpts {CKPT PATH} --slip_model {PATH TO SLIP MODEL}
To-Do:
- Model weights from pre-training
- Model weights from fine-tuning
- Zero-shot inference