We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It took 30s to generate ~100 tokens on A6000 GPU, which i found around 5x slower than LLAVA of same size and same quant. Why is it the case?
The text was updated successfully, but these errors were encountered:
I am trying to investigate it!
Is it sure that your model is in gpu vram?
Sorry, something went wrong.
I think it may stem from flash attention
LLaVA official repository model and other huggingface models are normally applied with flash attention
However, I checked MoAI is not applied with it well.
Therefore, I will try to equip it!
No branches or pull requests
It took 30s to generate ~100 tokens on A6000 GPU, which i found around 5x slower than LLAVA of same size and same quant. Why is it the case?
The text was updated successfully, but these errors were encountered: