update

exitudio · Dec 16, 2023 · a1eca77 · a1eca77
1 parent 108003b
commit a1eca77
Show file tree

Hide file tree

Showing 13 changed files with 274 additions and 313 deletions.
diff --git a/README.md b/README.md
@@ -12,7 +12,7 @@
 
 <details>
 
-### Conda Environment
+### 1. Conda Environment
 ```
 conda env create -f environment.yml
 conda activate momask
@@ -21,7 +21,7 @@ pip install git+https://github.com/openai/CLIP.git
 We test our code on Python 3.7.13 and PyTorch 1.7.1
 
 
-### Models and Dependencies
+### 2. Models and Dependencies
 
 #### Download Pre-trained Models
 ```
@@ -38,9 +38,152 @@ bash prepare/download_glove.sh
 #### (Optional) Download Mannually
 Visit [[Google Drive]](https://drive.google.com/drive/folders/1b3GnAbERH8jAoO5mdWgZhyxHB73n23sK?usp=drive_link) to download the models and evaluators mannually.
 
-### Get Data
+### 3. Get Data
 
+You have two options here:
+* **Skip getting data**, if you just want to generate motions using *own* descriptions.
+* **Get full data**, if you want to *re-train* and *evaluate* the model.
+
+**(a). Full data (text + motion)**
+
+**HumanML3D** - Follow the instruction in [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git), then copy the result dataset to our repository:
+```
+cp -r ../HumanML3D/HumanML3D ./dataset/HumanML3D
+```
+**KIT**-Download from [HumanML3D](https://github.com/EricGuo5513/HumanML3D.git), then place result in `./dataset/KIT-ML`
+
+#### 
+
+</details>
+
+## :rocket: Demo
+<details>
+
+### (a) Generate from a single prompt
+```
+python gen_t2m.py --gpu_id 1 --ext exp1 --text_prompt "A person is sitting on a chair"
+```
+### (b) Generate from a prompt file
+An example of prompt file is given in `./assets/text_prompt.txt`. Please follow the format of `<text description>#<motion length>` at each line. Motion length indicates the number of poses, which must be integeter and will be rounded by 4. In our work, motion is in 20 fps.
+
+If you write `<text description>#NA`, our model will determine a length. Note once there is **one** NA, all the others will be **NA** automatically.
+
+```
+python gen_t2m.py --gpu_id 1 --ext exp2 --text_path ./assets/text_prompt.txt
+```
+
+
+A few more parameters you may be interested:
+* `--repeat_times`: number of replications for generation
+* `--motion_length`: specify the number of poses for generation, only applicable in (a).
+
+The output files are stored under folder `./generation/<ext>/`. They are
+* `numpy files`: generated motions with shape of (nframe, 22, 3), under subfolder `./joints`.
+* `video files`: stick figure animation in mp4 format, under subfolder `./animation`.
+* `bvh files`: bvh files of the generated motion, under subfolder `./animation`.
+
+We also apply naive foot ik to the generated motions, see files with suffix `_ik`. It sometimes works well, but sometimes will fail.
+
+</details>
+
+## :dancers: Visualization
+<details>
+
+All the animations are manually rendered in blender. We use the characters from [mixamo](https://www.mixamo.com/#/). You need to download the characters in T-Pose with skeleton.
+
+### Retargeting
+For retargeting, we found rokoko usually leads to large error on foot. On the other hand, [keemap.rig.transfer](https://github.com/nkeeline/Keemap-Blender-Rig-ReTargeting-Addon/releases) shows more precise retargetting. You could watch the [tutorial](https://www.youtube.com/watch?v=EG-VCMkVpxg) here.
+
+Following these steps:
+* Download keemap.rig.transfer from the github, and install it in blender.
+* Import both the motion files (.bvh) and character files (.fbx) in blender.
+* `Shift + Select` the both source and target skeleton. (Do not need to be Rest Position)
+* Switch to `Pose Mode`, then unfold the `KeeMapRig` tool at the top-right corner of the view window.
+* Load and read the bone mapping file `./assets/mapping.json`(or `mapping6.json` if it doesn't work). This file is manually made by us. It works for most characters in mixamo. You could make your own.
+* Adjust the `Number of Samples`, `Source Rig`, `Destination Rig Name`.
+* Clik `Transfer Animation from Source Destination`, wait a few seconds.
+
+We didn't tried other retargetting tools. Welcome to comment if you find others are more useful.
+
+### Scene
+
+We use this [scene](https://drive.google.com/file/d/1lg62nugD7RTAIz0Q_YP2iZsxpUzzOkT1/view?usp=sharing) for animation.
+
+
+</details>
+
+## :clapper: Temporal Inpainting
+<details>
+
+To be continuted.
+</details>
+
+## :space_invader: Train Your Own Models
+<details>
+
+
+**Note**: You have to train RVQ **BEFORE** training masked/residual transformers. The latter two can be trained simultaneously.
+
+### Train RVQ
+```
+python train_vq.py --name rvq_name --gpu_id 1 --dataset_name t2m --batch_size 512 --num_quantizers 6  --max_epoch 500 --quantize_drop_prob 0.2
+```
+
+### Train Masked Transformer
+```
+python train_t2m_transformer.py --name mtrans_name --gpu_id 2 --dataset_name t2m --batch_size 64 --vq_name rvq_name
+```
+
+### Train Residual Transformer
+```
+python train_res_transformer.py --name rtrans_name  --gpu_id 2 --dataset_name t2m --batch_size 64 --vq_name rvq_name --cond_drop_prob 0.2 --share_weight
+```
+
+* `--dataset_name`: motion dataset, `t2m` for HumanML3D and `kit` for KIT-ML.  
+* `--name`: name your model. This will create to model space as `./checkpoints/<dataset_name>/<name>`
+* `--gpu_id`: GPU id.
+* `--batch_size`: we use `512` for rvq training. For masked/residual transformer, we use `64` on HumanML3D and `16` for KIT-ML.
+* `--num_quantizers`: number of quantization layers, `6` is used in our case.
+* `--quantize_drop_prob`: quantization dropout ratio, `0.2` is used.
+* `--vq_name`: when training masked/residual transformer, you need to specify the name of rvq model for tokenization.
+* `--cond_drop_prob`: condition drop ratio, for classifier-free guidance. `0.2` is used.
+* `--share_weight`: whether to share the projection/embedding weights in residual transformer.
+
+All the pre-trained models and intermediate results will be saved in space `./checkpoints/<dataset_name>/<name>`.
+</details>
+
+## :book: Evaluation
+<details>
+
+### Evaluate RVQ Reconstruction:
+HumanML3D:
+```
+python eval_t2m_vq.py --gpu_id 0 --name rvq_nq6_dc512_nc512_noshare_qdp0.2 --dataset_name t2m --ext rvq_nq6
+
+```
+KIT-ML:
+```
+python eval_t2m_vq.py --gpu_id 0 --name rvq_nq6_dc512_nc512_noshare_qdp0.2_k --dataset_name kit --ext rvq_nq6
+```
+
+### Evaluate Text2motion Generation:
+HumanML3D:
+```
+python eval_t2m_trans_res.py --res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw --dataset_name t2m --name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns --gpu_id 1 --cond_scale 4 --time_steps 10 --ext evaluation
+```
+KIT-ML:
+```
+python eval_t2m_trans_res.py --res_name tres_nlayer8_ld384_ff1024_rvq6ns_cdp0.2_sw_k --dataset_name kit --name t2m_nlayer8_nhead6_ld384_ff1024_cdp0.1_rvq6ns_k --gpu_id 0 --cond_scale 2 --time_steps 10 --ext evaluation
+```
+
+* `--res_name`: model name of `residual transformer`.  
+* `--name`: model name of `masked transformer`.  
+* `--cond_scale`: scale of classifer-free guidance.
+* `--time_steps`: number of iterations for inference.
+* `--ext`: filename for saving evaluation results.
+
+The final evaluation results will be saved in `./checkpoints/<dataset_name>/<name>/eval/<ext>.log`
 
 </details>
 
-### To be continued.
+## Acknowlegements
diff --git a/assets/mapping.json b/assets/mapping.json
diff --git a/assets/mapping6.json b/assets/mapping6.json
diff --git a/assets/text_prompt.txt b/assets/text_prompt.txt
@@ -0,0 +1,12 @@
+the person holds his left foot with his left hand, puts his right foot up and left hand up too.#132
+a man bends down and picks something up with his left hand.#84
+A man stands for few seconds and picks up his arms and shakes them.#176
+A person walks with a limp, their left leg get injured.#192
+a person jumps up and then lands.#52
+a person performs a standing back kick.#52
+A person pokes their right hand along the ground, like they might be planting seeds.#60
+the person steps forward and uses the left leg to kick something forward.#92
+the man walked forward, spun right on one foot and walked back to his original position.#92
+the person was pushed but did not fall.#124
+this person stumbles left and right while moving forward.#132
+a person reaching down and picking something up.#148
diff --git a/dataset/__init__.py b/dataset/__init__.py
diff --git a/environment.yml b/environment.yml
@@ -46,7 +46,6 @@ dependencies:
   - fontconfig=2.13.1=h6c09931_0
   - freetype=2.11.0=h70c0345_0
   - frozenlist=1.3.3=py37h5eee18b_0
-  - gdown=4.5.1=pyhd8ed1ab_0
   - giflib=5.2.1=h7b6447c_0
   - glib=2.69.1=h4ff587b_1
   - gst-plugins-base=1.14.0=h8213a91_2
@@ -188,6 +187,7 @@ dependencies:
     - cachetools==5.3.1
     - einops==0.6.1
     - ftfy==6.1.1
+    - gdown==4.7.1
     - google-auth==2.22.0
     - google-auth-oauthlib==0.4.6
     - grpcio==1.57.0

diff --git a/eval_t2m_vq.py b/eval_t2m_vq.py
@@ -3,7 +3,7 @@
 from os.path import join as pjoin
 
 import torch
-from models.vq.model import RVQVAE, HVQVAE
+from models.vq.model import RVQVAE
 from options.vq_option import arg_parse
 from motion_loaders.dataset_motion_loader import get_dataset_motion_loader
 import utils.eval_t2m as eval_t2m