Fix test auto tpu #1126

vfdev-5 · 2020-06-13T11:44:29Z

Description:

Fixed failing tpu tests
improved cifar10 run method docstring

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

* Added Windows/MacOSX CI for py3.7 only (#1113) * [WIP] Added Windows CI for py2.7 only * Excluded examples from windows ci * Update unittests.yml * Update unittests.yml * Fixed shell bash as suggested * Fixed failing tests on Win32 - added MNIST test for Win32 in Github actions - added tests on macosx in Github actions * Fixed isort * Fixes tests with IterableDataset * Skipped slow deterministic tests on win32 * skip failing timer tests on macos * fix macos platform name * fix _test_setup_logging * skip frequency tests on win platform * skip time tests on macos * fix flake8 * fix isort * Skip distrib tests for Win32 * skip time test for macos * Updated github actions yaml * skip modules for macos * Fixed bad skip of deterministic tests, reduced time for slow tests * Do not run dist tests on macosx Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * [FR] Parallel helper tools (#1014) (#1116) * [FR] Parallel helper tools (#1014) * [WIP] auto and parallel dist modules * [WIP] auto optim * Added xla optimizer wrapper - other code updates * Updated auto and cifar10 example * - Fixed resume from - other cosmetics * Fixed bug with _XLADistributedOptimizer - updated default LR * autopep8 fix * Updated README and minor fixes * autopep8 fix * - Removed mnist distributed example - Reverted unintended modifications * Tests of auto methods * autopep8 fix * Tests, docs and code updates * autopep8 fix * Up code, test, cifar10 example and docs * Added option to stop the training - updated ci * Updated readme and fixed ci configs * - Updated code, README and remove old cifar10 Co-authored-by: vfdev-5 <vfdev.5@gmail.com> Co-authored-by: AutoPEP8 <> * Fixes failing tests * Minor updates * Other minor updates * Example readme update and minor fixes * Added test on load_objects ddp to improve coverage * Added more tests for parallel launcher * Replaced pbars by logger * Updated link to cifar10 example * Fixes codecov upload * Updated coverage report type for gpu/tpu Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * Fixes #1120 (#1122) * Fixes #1120 - Aligned idist args and method names to torch.distributed.launch * replace missing num_procs_per_node * black format * fix bug * replace num_nodes by nnodes Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * reverse order of remove/save in Checkpoint handling (#1117) * reverse order of remove/save so there is never an n+1 checkpoint situation. * evict if new item is better than candidate for eviction. * swap order of updating saved and saving to ensure consistency of state * remove redundant method. Co-authored-by: vfdev <vfdev.5@gmail.com> * Fix test auto tpu (#1126) * Fixed failing tpu tests * Updated docstring of cifar10 example * Auto pin_memory (#1129) * Auto pin_memory * autopep8 fix Co-authored-by: AutoPEP8 <> * fix auto pin_memory : idist.device().type should be used (#1131) * fix auto pin_memory : idist.device().type should be used * fix cuda in device * fix test * use idist.device().type to test * add missing () Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> * Update pascal voc12 example (#1125) * [WIP][Pascal-VOC12] Update/refactor example * [WIP][Pascal-VOC12] Update/refactor example 2 * [WIP] Updated mlflow files * Removed unused files * Fixed flake and black * Removed unused import and fixed version for mlflow Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * fix cifar10 model : num_classes missing (#1134) Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> * Accuracy MultiLabel Handling and Error Message (#1132) * Updated check for multilabel and error message * Updated docstring and error message * Updated error message formatting Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * Updated ImageNet example (#1138) * [WIP] Updated ImageNet example - minor fixes for Pascal VOC12 * Fixed flake8 * Updated pytorch-version-tests.yml to run cron every day at 00:00 UTC (#1141) Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> * Added check_compute_fn argument to EpochMetric and related metrics (#1140) * Added check_compute_fn argument to EpochMetric and related functions. * Updated docstrings * Added check_compute_fn to _BaseRegressionEpoch * Adding typing hints for check_compute_fn * Update roc_auc.py Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> Co-authored-by: vfdev <vfdev.5@gmail.com> * Docs cosmetics (#1142) * Updated docs, replaced single quote by double quote if is code - fixed missing link to Engine - cosmetics * More doc updates * More updates * Fix batch size calculation error (#1137) * Fix batch size calculation error * Add tests for fixed batch size calculation * Fix tests * Test for num_workers * Fix nproc comparison * Improve docs * Fixed docstring Co-authored-by: vfdev <vfdev.5@gmail.com> * Docs updates (#1139) * [WIP] Added teaser gif * [WIP] Updated README * [WIP] Updated README * [WIP] Updated docs * Reverted unintended pyproject.toml edits * Updated README and examples parts * More updates of README * Added badge to check pytorch/python compatible versions * Updated README * Added ref to blog "Using Optuna to Optimize PyTorch Ignite Hyperparameters" * Update README.md * Fixed bad internal link in examples * Updated README * Fixes docs (#1147) * Fixed bad link on teaser * Added manual_seed into docs * Issue #1115 : pbar persists due to specific rule in tqdm (notebook) when n < total (#1145) * Issue #1115 pbar persists in notebook due to specific rules when n < total * close pbar doesn't rise danger bar * fix when pbar.total is None Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> * Updated codebase such that torch>=1.3 (#1150) Co-authored-by: vfdev <vfdev.5@gmail.com> * add wandb (#1152) wandb integration already exists, just adding it to the requirements file * Fixed typo and missing part of "Where to go next" (#1151) * Fixes #1153 (#1154) - temporary downgrade of scipy to 1.4.1 instead of 1.5.0 * Use global_step as priority, if it exists (#1155) * Use global_step as priority, if it exists * Fix flake8 error * Style fix Co-authored-by: vfdev <vfdev.5@gmail.com> * Fix TrainsSaver handling of Checkpoint's n_saved (#1135) * Utilize Trains framework callbacks to better support checkpoint saving and respect Checkpoint.n_saved * Update trains callbacks to new format * autopep8 fix * Fix trains mnist example (store checkpoints in local folder) * Use trains 0.15.1rc0 until PR is approved * Use CallbackType for Trains callback type resolution. Add unit test for Trains callbacks * Update trains version * Updated test_trains_saver_callbacks Co-authored-by: jkhenning <> Co-authored-by: vfdev <vfdev.5@gmail.com> * Stateful handlers (#1156) * Stateful handlers * Added state_dict/load_state_dict tests for Checkpoint * integration test * Updated docstring and added include_self to ModelCheckpoint * An integreation test for checkpointing with stateful handlers * Black and flake8 Co-authored-by: vfdev-5 <vfdev.5@gmail.com> * Bump version to 0.4rc.0.post1 * bump version to v0.4.0 🎉 Co-authored-by: vfdev <vfdev.5@gmail.com> Co-authored-by: Sylvain Desroziers <sylvain.desroziers@gmail.com> Co-authored-by: Desroziers <sylvain.desroziers@ifpen.fr> Co-authored-by: Marijan Smetko <marijansmetko123@gmail.com> Co-authored-by: Anmol Joshi <anmolsjoshi@gmail.com> Co-authored-by: Lavanya Shukla <lavanya.shukla12@gmail.com> Co-authored-by: Akihiro Matsukawa <amatsukawa@users.noreply.github.com> Co-authored-by: Jake Henning <59198928+jkhenning@users.noreply.github.com>

vfdev-5 added 3 commits June 13, 2020 00:48

Fixed failing tpu tests

316cf6b

Merge branch 'master' into fix-test-auto-tpu

b4d1d96

Updated docstring of cifar10 example

0486d81

vfdev-5 merged commit 6b88eb8 into pytorch:master Jun 13, 2020

vfdev-5 deleted the fix-test-auto-tpu branch June 13, 2020 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test auto tpu #1126

Fix test auto tpu #1126

vfdev-5 commented Jun 13, 2020

Fix test auto tpu #1126

Fix test auto tpu #1126

Conversation

vfdev-5 commented Jun 13, 2020