Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Merge idist into master #1045

Merged
merged 54 commits into from
May 31, 2020
Merged
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
177fb6f
Improved parallel utils (#1023)
vfdev-5 May 11, 2020
91d8875
[WIP] create from context for XLA
vfdev-5 May 11, 2020
3cfccd4
autopep8 fix
May 11, 2020
f71043f
Tests for _sync_model for XLA
vfdev-5 May 11, 2020
093ddb1
autopep8 fix
May 11, 2020
7ad7fcf
More tests and updates
vfdev-5 May 11, 2020
d57b3c9
autopep8 fix
May 11, 2020
7fcadca
[WIP] create from context for Native Torch Dist
vfdev-5 May 12, 2020
5a6e052
autopep8 fix
May 12, 2020
1c362fe
Added tests for idist.* created from context for native dist settings
vfdev-5 May 12, 2020
12512cf
[WIP] Fix tests
vfdev-5 May 13, 2020
228fd89
Fixed metric related tests
vfdev-5 May 13, 2020
b09ea05
autopep8 fix
May 13, 2020
a23da8e
Merge branch 'master' of https://github.com/pytorch/ignite into idist
vfdev-5 May 13, 2020
da72b15
[WIP] idist - Docs & code updates (#1034)
vfdev-5 May 15, 2020
0352bc6
Merge branch 'master' into origin-idist
vfdev-5 May 15, 2020
16256cf
Merge branch 'master' of https://github.com/pytorch/ignite into origi…
vfdev-5 May 16, 2020
914bba9
Tpu metrics (#1042)
vfdev-5 May 16, 2020
feb79b4
Merge branch 'master' into idist
vfdev-5 May 16, 2020
25d38d1
Increased err tol for mse and rmse tests on single TPU
vfdev-5 May 16, 2020
8886948
Fixes #991 (#1047)
vfdev-5 May 16, 2020
add8a4d
Merge branch 'master' into idist
vfdev-5 May 16, 2020
bdae449
add TPU checkpointing to CPU. (#1005)
erip May 16, 2020
d1cc29d
Updated tests on checkpoint and TPU
vfdev-5 May 16, 2020
977ac8c
Merge branch 'master' into idist
vfdev-5 May 17, 2020
15072ae
Added barrier op in idist (#1050)
vfdev-5 May 17, 2020
ac86d46
Merge branch 'master' into idist
vfdev-5 May 18, 2020
037e7f7
Fixed bug with torch.cuda.set_device
vfdev-5 May 19, 2020
2a01cc3
Fixed cuda device index, added warning if cuda device index != local …
vfdev-5 May 19, 2020
1f54ab5
autopep8 fix
May 19, 2020
199224a
Merge branch 'master' into idist
vfdev-5 May 22, 2020
888a654
Issue 1011 (#1053)
vfdev-5 May 22, 2020
ae1bdf5
Improved device() method (#1062)
vfdev-5 May 23, 2020
0fa8c61
Merge branch 'master' into idist
sdesrozis May 23, 2020
537dbd0
Idist kwargs dict (#1064)
vfdev-5 May 23, 2020
727f038
removed badly merged _need_to_sync
vfdev-5 May 23, 2020
530c422
Improved device and setup_common_training_handlers (#1066)
vfdev-5 May 24, 2020
74ddacb
Idist improve2 (#1075)
vfdev-5 May 28, 2020
6735dc0
Merge branch 'master' into idist
vfdev-5 May 28, 2020
b1b5d56
Merge branch 'master' into idist
vfdev-5 May 28, 2020
1e5d7d3
Added support for str input for all gather (#1081)
vfdev-5 May 29, 2020
89e1358
Fix #1055 (#1068)
sdesrozis May 29, 2020
1c34eda
Merge branch 'master' into idist
vfdev-5 May 29, 2020
d277a25
Fix failing tests on multi-gpus
vfdev-5 May 29, 2020
d9a80c6
Fix failing XLA tests
vfdev-5 May 30, 2020
f617787
Merge branch 'master' into idist
vfdev-5 May 30, 2020
a8f03e8
Merge branch 'master' into idist
vfdev-5 May 31, 2020
b41cf6d
Fixes failing tests on multi-GPUs
vfdev-5 May 31, 2020
222cb60
autopep8 fix
May 31, 2020
b3b9aff
Remove useless barriers (#1085)
sdesrozis May 31, 2020
44f4c63
Fixes failing TPU with fork mp
vfdev-5 May 31, 2020
8989e5e
Merge branch 'master' into idist
vfdev-5 May 31, 2020
f4ee4f9
Applied review suggestions
vfdev-5 May 31, 2020
669ef8a
autopep8 fix
May 31, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'master' into idist
  • Loading branch information
vfdev-5 committed May 31, 2020
commit a8f03e850f41e0de64689773bb8fe465a71004f8
19 changes: 19 additions & 0 deletions tests/ignite/handlers/test_checkpoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -907,10 +907,28 @@ def _test(ext):
_test(".pt")


def _test_checkpoint_with_ddp(device):
model = DummyModel().to(device)
ddp_model = nn.parallel.DistributedDataParallel(model)
to_save = {"model": ddp_model}

save_handler = MagicMock(spec=BaseSaveHandler)
checkpointer = Checkpoint(to_save, save_handler=save_handler)

trainer = Engine(lambda e, b: None)
trainer.state = State(epoch=0, iteration=0)

checkpointer(trainer)
assert save_handler.call_count == 1
metadata = {"basename": "model", "score_name": None, "priority": 0}
save_handler.assert_called_with(model.state_dict(), "model_0.pt", metadata)


@pytest.mark.distributed
def test_distrib_cpu(distributed_context_single_node_gloo, get_rank_zero_dirname):
dirname = get_rank_zero_dirname("cpu")
_test_save_model_optimizer_lr_scheduler_with_state_dict("cpu", os.path.join(dirname, "1"))
_test_checkpoint_with_ddp("cpu")


@pytest.mark.distributed
Expand All @@ -919,6 +937,7 @@ def test_distrib_gpu(distributed_context_single_node_nccl, get_rank_zero_dirname
device = idist.device()
dirname = get_rank_zero_dirname(device)
_test_save_model_optimizer_lr_scheduler_with_state_dict(device, os.path.join(dirname, "1"))
_test_checkpoint_with_ddp(device=device)


def _test_tpu_saves_to_cpu(device, dirname):
Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.