Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect log position #1098

Closed
Yevgnen opened this issue Jun 3, 2020 · 9 comments · Fixed by #1099
Closed

Incorrect log position #1098

Yevgnen opened this issue Jun 3, 2020 · 9 comments · Fixed by #1099

Comments

@Yevgnen
Copy link
Contributor

Yevgnen commented Jun 3, 2020

🐛 Bug description

When {bar} is removed from bar_format of argument of ProgressBar, immediate logger output does not print in newline.

Screen Shot 2020-06-03 at 3 58 28 PM

Add

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

to beginning of mnist_with_tqdm_logger.py and change this line to

    pbar = ProgressBar(
        persist=True, bar_format="{desc}[{n_fmt}/{total_fmt}] {percentage:3.0f}%|{postfix} [{elapsed}<{remaining}]",
    )

This seems not happen to very small example

import logging
import time

import tqdm

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

logger.info("before")

for i in tqdm.tqdm(
    range(3),
    leave=True,
    bar_format="{desc}[{n_fmt}/{total_fmt}] {percentage:3.0f}%|{postfix} [{elapsed}<{remaining}]",
):
    time.sleep(0.5)

logger.info("after")

Screen Shot 2020-06-03 at 4 08 55 PM

Note that the leave argument in tqdm corresponds to the persist in ProgressBar.

Environment

  • PyTorch Version (e.g., 1.4): 1.5.0
  • Ignite Version (e.g., 0.3.0): 2d30d1d
  • OS (e.g., Linux): macOS 10.15.5
  • How you installed Ignite (conda, pip, source): source
  • Python version: 3.7.5
  • Any other relevant information: None
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jun 3, 2020

@Yevgnen thanks for reporting ! I could reproduce the issue which is related to the fact that internally in the engine, logger.info for epoch complete is called before calling all attached handlers (incl. ProgressBar). What can be done is to move self.logger.info("Epoch[%s] Complete. after self._fire_event:

hours, mins, secs = _to_hours_mins_secs(time_taken)
self.logger.info("Epoch[%s] Complete. Time taken: %02d:%02d:%02d", self.state.epoch, hours, mins, secs)
if self.should_terminate:
self._fire_event(Events.TERMINATE)
break
self._fire_event(Events.EPOCH_COMPLETED)

This will give the following output:

root@user:/home/project/ml/tmp/ignite# python issue_1098.py --epochs 2
2020-06-03 09:43:36,410 Trainer INFO: Engine run starting with max_epochs=2.
Epoch [1/2]: [938/938] 100%|, loss=0.471 [00:10<00:00]
2020-06-03 09:43:47,505 Evaluator INFO: Engine run starting with max_epochs=1.
2020-06-03 09:43:56,307 Evaluator INFO: Epoch[1] Complete. Time taken: 00:00:09
2020-06-03 09:43:56,307 Evaluator INFO: Engine run complete. Time taken 00:00:09
Training Results - Epoch: 1  Avg accuracy: 0.94 Avg loss: 0.20
2020-06-03 09:43:56,307 Evaluator INFO: Engine run starting with max_epochs=1.
2020-06-03 09:43:57,678 Evaluator INFO: Epoch[1] Complete. Time taken: 00:00:01
2020-06-03 09:43:57,678 Evaluator INFO: Engine run complete. Time taken 00:00:01
Validation Results - Epoch: 1  Avg accuracy: 0.95 Avg loss: 0.19
2020-06-03 09:43:57,678 Trainer INFO: Epoch[1] Complete. Time taken: 00:00:11
Epoch [2/2]: [938/938] 100%|, loss=0.339 [00:10<00:00]
2020-06-03 09:44:08,652 Evaluator INFO: Engine run starting with max_epochs=1.
2020-06-03 09:44:17,472 Evaluator INFO: Epoch[1] Complete. Time taken: 00:00:09
2020-06-03 09:44:17,472 Evaluator INFO: Engine run complete. Time taken 00:00:09
Training Results - Epoch: 2  Avg accuracy: 0.96 Avg loss: 0.14
2020-06-03 09:44:17,472 Evaluator INFO: Engine run starting with max_epochs=1.
2020-06-03 09:44:18,823 Evaluator INFO: Epoch[1] Complete. Time taken: 00:00:01
2020-06-03 09:44:18,823 Evaluator INFO: Engine run complete. Time taken 00:00:01
Validation Results - Epoch: 2  Avg accuracy: 0.96 Avg loss: 0.12
2020-06-03 09:44:18,823 Trainer INFO: Epoch[2] Complete. Time taken: 00:00:11
2020-06-03 09:44:18,823 Trainer INFO: Engine run complete. Time taken 00:00:42

If you wish/would like to contribute, please do not hesitate to send a PR with this modification, it could help us to make the library better :)

Yevgnen added a commit to Yevgnen/ignite that referenced this issue Jun 3, 2020
@Yevgnen
Copy link
Contributor Author

Yevgnen commented Jun 3, 2020

@vfdev-5 You are always fast :-)

@Yevgnen
Copy link
Contributor Author

Yevgnen commented Jun 3, 2020

Does logs like these look weird?

2020-06-03 20:14:44,220 trainer INFO: Engine run starting with max_epochs=5.
2020-06-03 20:14:44,221 evaluator INFO: Engine run starting with max_epochs=1.
2020-06-03 20:14:44,221 evaluator INFO: Epoch[1] Complete. Time taken: 00:00:00
2020-06-03 20:14:44,221 evaluator INFO: Engine run complete. Time taken 00:00:00
2020-06-03 20:14:44,221 trainer INFO: Epoch[1] Complete. Time taken: 00:00:00

The train finished log is printed after all the evaluator logs...while the original output looks like

2020-06-03 20:13:45,629 trainer INFO: Engine run starting with max_epochs=5.
2020-06-03 20:13:45,630 trainer INFO: Epoch[1] Complete. Time taken: 00:00:00
2020-06-03 20:13:45,630 evaluator INFO: Engine run starting with max_epochs=1.
2020-06-03 20:13:45,630 evaluator INFO: Epoch[1] Complete. Time taken: 00:00:00
2020-06-03 20:13:45,630 evaluator INFO: Engine run complete. Time taken 00:00:00

P.S. I got these output from pytest tests/ignite/test_utils.py::test_setup_logger.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jun 3, 2020

@Yevgnen I think new logs are OK

  • trainer is started => trainer INFO: Engine run starting with max_epochs=5.
  • epoch completed => run evaluation =>
2020-06-03 20:14:44,221 evaluator INFO: Engine run starting with max_epochs=1.
2020-06-03 20:14:44,221 evaluator INFO: Epoch[1] Complete. Time taken: 00:00:00
2020-06-03 20:14:44,221 evaluator INFO: Engine run complete. Time taken 00:00:00
  • all handlers are executed => trainer INFO: Epoch[1] Complete. Time taken: 00:00:00

@Yevgnen
Copy link
Contributor Author

Yevgnen commented Jun 3, 2020

Yes, I see that. But the elapsed time outputted by the trainer does not include the evaluations, I thought that it might confused users. I'm OK with it.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jun 3, 2020

But the elapsed time outputted by the trainer does not include the evaluations, I thought that it might confused users.

True. Let's see what others say... cc @erip @sdesrozis

@sdesrozis
Copy link
Contributor

sdesrozis commented Jun 3, 2020

Digression mode : Hey, I was facing a problem that is related, I guess. When I activate root logger of logging using setup_logger() (similar to what is done by basicConfig), ProgressBar does always persist despite persist=False... And I suppose that is linked with the current discussion !!

True. Let's see what others say... cc @erip @sdesrozis

I think time resume for an epoch should include the evaluations. In that case, do we split resume report ?

A small workaround should be to allow time modifications

@trainer.on(Events.EPOCH_COMPLETED)
def _():
    evaluator.run([0, 1, 2])
    time_taken = evaluator.state.times["EPOCH_COMPLETED"]
    trainer.state.times["EPOCH_COMPLETED"] += time_taken

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Jun 4, 2020

OK, let's address the problem of update of time taken by epoch with the times after all handlers in a separate issue.

vfdev-5 added a commit that referenced this issue Jun 4, 2020
* Fixed incorrect log poistion. (#1098)

* Fixed missing logging call.

* Fixed tests.

Co-authored-by: vfdev <vfdev.5@gmail.com>
@failable
Copy link
Contributor

failable commented Jun 5, 2020

I have a similar issue, when handlers for Events.ITERATION_COMPLETED which lies between Events.EPOCH_STARTED and Events.EPOCH_COMPLETED output logs(e.g., adjust lr based on iteration and output a info message like lr adjusted to ...), they come after the progress bar if the {bar} is hidden.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants