-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idist improve2 #1075
Merged
Merged
Idist improve2 #1075
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vfdev-5
added a commit
that referenced
this pull request
May 31, 2020
* Improved parallel utils (#1023) * add utils for distributed * autopep8 fix * [WIP] Added comp models and tests * [WIP] Added create_from_backend and create_from_context for _DistModel * [WIP] Added spawn to _DistModel * [WIP] Refactored comp models and added spawn for xla * autopep8 fix * Improved tests * autopep8 fix * Fixes flake8 * autopep8 fix * - Removed is_distributed - Renamed _DistModel -> _NativeDistModel * autopep8 fix * Added docs and tests for xla spawn * Fixes conftest bug * autopep8 fix * Updates * autopep8 fix * Fixes available_backends bug * autopep8 fix * Fixed tests Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> Co-authored-by: AutoPEP8 <> * [WIP] create from context for XLA * autopep8 fix * Tests for _sync_model for XLA * autopep8 fix * More tests and updates * autopep8 fix * [WIP] create from context for Native Torch Dist * autopep8 fix * Added tests for idist.* created from context for native dist settings * [WIP] Fix tests * Fixed metric related tests * autopep8 fix * [WIP] idist - Docs & code updates (#1034) * [WIP] Updates docs * Adapted metrics with idist, fixed tests - added local rank estimation with hostname heuristics * autopep8 fix * Adapted metrics code and tests to use idist * autopep8 fix * Updated docs, docstrings * Updated xla tests and fix a bug with tensor dtype * autopep8 fix * Fixed all gather using all reduce op * autopep8 fix * Improved tests of create_supervised_trainer on TPU Co-authored-by: AutoPEP8 <> * Tpu metrics (#1042) * [WIP] make accumulation tests on TPU(s) * Fixed tests with accumulation metric - by decreasing tolerence * Fixed all_gather bug * autopep8 fix * Added metric tests for xla * autopep8 fix * Fixed bug in test of precision - updated other regression tests * Fixed failing tests on TPU - increased err tolerence Co-authored-by: AutoPEP8 <> * Increased err tol for mse and rmse tests on single TPU * Fixes #991 (#1047) - average output in RunningAverage * add TPU checkpointing to CPU. (#1005) * add TPU checkpointing to CPU. * autopep8 fix * update docstring to include TPU notice. * add skip for non-TPU tests. * autopep8 fix * refactor to use idist API. * autopep8 fix * add complex save with TPU. * autopep8 fix * fix tests. * fix typo in docstring. Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: AutoPEP8 <> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> Co-authored-by: vfdev <vfdev.5@gmail.com> * Updated tests on checkpoint and TPU * Added barrier op in idist (#1050) * Added barrier op in idist * Fixed test and updated one_rank_only to use idist * Moved one_rank_only to idist, adapted tests * autopep8 fix * Removed redundant imports * Another test fix of setup_logger Co-authored-by: AutoPEP8 <> * Fixed bug with torch.cuda.set_device * Fixed cuda device index, added warning if cuda device index != local rank * autopep8 fix * Issue 1011 (#1053) * Fixed #1011 * autopep8 fix * Fixed failing test * Updated tests/ignite/contrib/engines/test_common.py * Updated tests to improve coverage * Fixed test Co-authored-by: AutoPEP8 <> * Improved device() method (#1062) * Improved device() method - fixes run_gpu_tests.sh to run tests that should skip if there is env `WORLD_SIZE` * Improved code coverage * Improve code coverage for serial model * Idist kwargs dict (#1064) * Added kwargs_dict to spawn, improved _set_model etc * Fixed bug in tests/run_gpu_tests.sh * Fixed black * Updated typing * removed badly merged _need_to_sync * Improved device and setup_common_training_handlers (#1066) * - idist.device() returns "torch.device('cuda')" if non-dist conf and cuda device is available * Improved setup_common_training_handlers - no need to handle train_sampler if idist.model_name() is not serial, but train_sampler is not setup as distributed due to 1 proc. - Warn only if train_sampler has set_epoch * Idist improve2 (#1075) * Improve tests on XLA * Fixes xla test when spawn without 'fork' * Added test of dtype for XLA * Added support for str input for all gather (#1081) * Added support for str input for all gather * More tests for better coverage * Fix #1055 (#1068) * issue_1055 * autopep8 fix * decorate and refactor getattr * remove decoration - need further discussions * Added missing decorator for plx * Added note about dist-friendly interface * Updated Checkpoint to dist config + TPU * autopep8 fix * [WIP] Checkpoint in dist config * autopep8 fix * [WIP] Checkpoint in dist config * autopep8 fix * [WIP] Checkpoint on XLA * autopep8 fix * Fix checkpoint tests on XLA * Put back Loggers as dist-unfriendly + tests for contrib savers * Updated tests for XLA - removed neptune xla tests * autopep8 fix * minor fix for coverage * [WIP] New XLA tests for trains logger * Fixed distrib tests for trains Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> Co-authored-by: AutoPEP8 <> Co-authored-by: vfdev-5 <vfdev.5@gmail.com> * Fix failing tests on multi-gpus * Fix failing XLA tests * Fixes failing tests on multi-GPUs * autopep8 fix * Remove useless barriers (#1085) * remove useless barriers * Fix failing tests * Added missing barrier in test for XLA Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> Co-authored-by: vfdev-5 <vfdev.5@gmail.com> * Fixes failing TPU with fork mp * Applied review suggestions * autopep8 fix Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> Co-authored-by: AutoPEP8 <> Co-authored-by: Elijah Rippeth <elijah.rippeth@gmail.com> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
Check list: