⭐️ Star to follow our team's projects !
🚀🚀🚀 Official implementation of DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.
-
Authors: Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wang
-
Institutes: The Chinese University of Hong Kong; Shanghai AI Laboratory
-
Resources: [Paper]
-
Models: [DualFocus-LLaVA-1.5-7B]; [DualFocus-LLaVA-1.5-13B]; [DualFocus-ShareGPT4V-13B]
[2024/02/22] The paper, evaluation code and checkpoints are released!
- Release evaluation code and checkpoints
- Release paper
- Release training code
- Speed up inference
Name | LLM | SEEDIMG | MMBench | GQA* | TextVQA |
---|---|---|---|---|---|
LLaVA-1.5-7B | Vicuna-7B | 66.2 | 64.3 | 67.2 | 58.2 |
DualFocus-LLaVA-1.5-7B | Vicuna-7B | 68.9 (+2.7) | 66.8 (+2.5) | 69.4 (+2.2) | 62.3 (+3.9) |
LLaVA-1.5-13B | Vicuna-13B | 68.2 | 67.7 | 69.3 | 61.3 |
DualFocus-LLaVA-1.5-13B | Vicuna-13B | 71.0 (+2.8) | 71.4 (+3.7) | 74.5 (+5.2) | 65.7(+4.4) |
ShareGPT4V-13B | Vicuna-13B | 70.8 | 68.5 | 71.1 | 62.2 |
DualFocus-ShareGPT4V-13B | Vicuna-13B | 72.9 (+2.1) | 71.0 (+3.5) | 75.7 (+4.6) | 66.7 (+4.5) |
GQA*: we convert the GQA dataset into multi-choice-question format via GPT-3.5. Please refer to here for details.
git clone https://github.com/InternLM/InternLM-XComposer --depth=1
cd projects/DualFocus
conda create -n DualFocus python=3.9 -y
conda activate DualFocus
pip install --upgrade pip
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
You should follow this instruction Data.md to manage the datasets.
We provide scripts for evaluation on 4 benchmarks. Here we take DualFocus-LLaVA-1.5-7B as an example. For slurm users, please config the parameters, PARTITION, QUOTA_TYPE and GPUS yourself.
- Download
mmbench_dev_20230712.tsv
and put under./playground/data/eval/mmbench
. - Multi-GPU inference.
# for single node inference
bash scripts/eval/eval_mmbench.sh yhcao/DualFocus-LLaVA-1.5-7B
# for slurm inference
bash scripts/eval/slurm_eval_mmbench.sh yhcao/DualFocus-LLaVA-1.5-7B
- Submit the results to the evaluation server:
./playground/data/eval/mmbench/answers_upload/{res}.xlsx
.
- Following the official instructions to download the images. Put images under
./playground/data/eval/seed_bench/SEED-Bench-image
. - Multiple-GPU inference and evaluate.
# for single node inference
bash scripts/eval/eval_seed.sh yhcao/DualFocus-LLaVA-1.5-7B
# for slurm inference
base scripts/eval/slurm_eval_seed.sh yhcao/DualFocus-LLaVA-1.5-7B
- Download
TextVQA_0.5.1_val.json
and images and extract to./playground/data/eval/textvqa
. - Multi-GPU inference and evaluate.
# for single node inference
bash scripts/eval/eval_textvqa.sh yhcao/DualFocus-LLaVA-1.5-7B
# for slurm inference
bash scripts/eval/slurm_eval_textvqa.sh yhcao/DualFocus-LLaVA-1.5-7B
- Download the data and evaluation scripts following the official instructions and put under
./playground/data/eval/gqa/data
. Download the json and put under./playground/data/eval/gqa
. You may need to modifyeval.py
as this due to the missing assets in the GQA v1.2 release. - Multi-GPU inference and evaluate.
# for single node inference
bash scripts/eval/eval_gqa.sh yhcao/DualFocus-LLaVA-1.5-7B
# for slurm inference
bash scripts/eval/slurm_eval_gqa.sh yhcao/DualFocus-LLaVA-1.5-7B
- LLaVA: the codebase we built upon. Thanks for their wonderful work.
- Vicuna: the amazing open-sourced large language model!
If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝
@article{cao2024dualfocus,
title={DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models},
author={Yuhang Cao and Pan Zhang and Xiaoyi Dong and Dahua Lin and Jiaqi Wang},
journal={arXiv preprint arXiv:2311.12793},
year={2024},
}
Usage and License Notices: The data and checkpoint is intended and licensed for research use only. They are also restricted to uses that follow the license agreement of LLaMA, Vicuna and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.