Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dont save moe lb-loss tensors if args.moe_loss_weight=0 #119

Merged

Conversation

michael-go
Copy link
Contributor

@michael-go michael-go commented Jul 11, 2024

Megablocks accumulates lb-loss tensors here and expects the user to call clear_load_balancing_loss() to release the memory. In our case we compute the lb-loss outside of Megablocks and had a GPU memory leak before we noticed this behaviour.
We can call clear_load_balancing_loss() after every Megablocks forward(), but it's even better to just avoid accumulating these tensors if Megablocks' lb-loss calculation is not needed - which can already be signaled by passing 0 to Arguments.args.moe_loss_weight

it takes GPU memory, and can also cause a leak if
clear_load_balancing_loss() is not called
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mihir-db mihir-db merged commit d2774b2 into databricks:main Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants