-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] fix incorrect name when fetch data in sparse optim #3808
Conversation
To trigger regression tests:
|
@@ -74,7 +74,7 @@ def step(self): | |||
# will send grad to each corresponding trainer | |||
if self._world_size > 1: | |||
# get idx split from kvstore | |||
idx_split = kvstore.get_partid(name, idics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you should remove name = emb._tensor.name
at line 49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm bit confused. emb._tensor.name
and emb.data_name
are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, they're different. as group_id
is attached to the name user specifies. so _tensor.name
returns the original name which does not append with group_id
. _tensor._name
is the globally unique name which is attached with group_id
. It's confusing indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove line 49?
can you add a test for this modification?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add test may be not possible, as the code logic can only be touched while world_size
> 1 and distributed is enabled. Is it possible to add such a test in test_dist_optim.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and line 49 cannot be removed, it's used in line 114/118/120/121
Description
fix for sparse optim
Checklist
Please feel free to remove inapplicable items for your PR.
or have been fixed to be compatible with this change
Changes