Have seq2seq just use gather #27025

muellerzr · 2023-10-23T15:38:01Z

What does this PR do?

In the case of using Seq2Seq, we don't want gather_for_metrics to use its magic and we just want to do .gather() (since otherwise it will drop samples in the former case as accelerate will drop "duplicates" based on the batch size, which leads to a bug).

This PR sets a new gather_function in the Trainer which by default is gather_for_metrics, but if a particular Trainer needs to modify it (such as Seq2SeqTrainer), then it can be specified.

Fixes #25231

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker @younesbelkada

HuggingFaceDocBuilderDev · 2023-10-23T16:04:00Z

The documentation is not available anymore as the PR was closed or merged.

muellerzr · 2023-10-23T17:36:41Z

~~@ArthurZucker any thoughts on what could fix this? I saw negligible time differences between main and my branch when running locally on CPU (~72s)~~

Looks like it's all passing now!

younesbelkada

Thanks ! Looks clean on my end

ArthurZucker

Thanks

tests/trainer/test_trainer_seq2seq.py

src/transformers/trainer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

amyeroberts

Thanks for fixing this!

A few comments/questions for my own understanding of the PR before I can approve:

Could you clarify in the PR description the issue i.e. what does gather_metrics do differently from gather (what is the "magic")?
Am I right in understanding this should only be applied to cases when evaluating generations from seq2seq models and the generation config specifies num_return_sequences > 1?
What happens and what should happen if I call evaluate with a generation config with num_return_sequences > 1 and then call a second time with num_return_sequences==1?

src/transformers/trainer.py

amyeroberts · 2023-11-02T14:52:15Z

tests/trainer/test_trainer_seq2seq.py

+        for num_return_sequences in range(1, 4):
+            gen_config.num_return_sequences = num_return_sequences
+            metrics = trainer.evaluate(eval_dataset=prepared_dataset, generation_config=gen_config)
+            assert (
+                metrics["eval_samples"] == dataset_len * num_return_sequences
+            ), f"Got {metrics['eval_samples']}, expected: {dataset_len * num_return_sequences}"


This works because the state of the trainer is set such that self.gather_function = self.accelerator.gather_metrics initially and then switches to self.accelerator.gather when num_return_sequences. However, I don't think this would work if you did for num_return_sequences in range(3, 0, -1), as the trainer would never have self.gather_function = self.accelerator.gather_metrics for when num_return_sequence=1

Modified the test to use range(3,0,-1). It still passed beforehand, but simplified the logic to just use gather()

muellerzr · 2023-11-02T16:02:27Z

@amyeroberts:

Am I right in understanding this should only be applied to cases when evaluating generations from seq2seq models and the generation config specifies num_return_sequences > 1?

Correct, otherwise we will drop samples. Technically we can avoid this entirely I think by just using gather, and the test seems to show that will indeed work fine. As a result, I'll simplify this to just use .gather()

What happens and what should happen if I call evaluate with a generation config with num_return_sequences > 1 and then call a second time with num_return_sequences==1?

Per your recommendation of the test, I tried this, and it worked as it should (because gather_for_metrics doesn't do anything for a bs of 1 really).

amyeroberts

@muellerzr Sorry, I still don't fully understand and need some clarification.

Correct, otherwise we will drop samples.

Why does using gather_metrics drop samples?

Technically we can avoid this entirely I think by just using gather, and the test seems to show that will indeed work fine.

Is this only true for Seq2SeqTrainer. If not, why not just use gather everywhere?

src/transformers/trainer_seq2seq.py

tests/trainer/test_trainer_seq2seq.py

muellerzr · 2023-11-03T15:55:47Z

@amyeroberts:

Why does using gather_metrics drop samples?

It's some logic in Accelerate, gather_for_metrics will drop samples if we think they've been duplicated (such as filling in the last batch of data if data has been duplicated to make DDP efficient), however here is an edge case where just using gather is better

Is this only true for Seq2SeqTrainer. If not, why not just use gather everywhere?

Yes, just Seq2Seq. Otherwise gather_for_metrics should be used always

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

amyeroberts

Thanks for iterating!

As a general comment, this seems like something that should really be resolved on the accelerate side, however the fix seems tidy enough here.

* Have seq2seq just use gather * Change * Reset after * Make slow * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Clean * Simplify and just use gather * Update tests/trainer/test_trainer_seq2seq.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * gather always for seq2seq --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

psychocosine · 2024-05-13T02:50:08Z

Hello @muellerzr , I think .gather will cause the last batch containing duplicated samples, and it will eventually lead to take these duplicated samples into computing the in compute_metrics , which is unexpected.

muellerzr added 2 commits October 23, 2023 15:29

Have seq2seq just use gather

69ec51d

Change

062485b

muellerzr requested review from ArthurZucker and younesbelkada October 23, 2023 15:38

Reset after

1e08a3e

muellerzr mentioned this pull request Oct 23, 2023

Seq2SeqTrainer.evaluate and predict don't yield the right number of predictions when num_return_sequences > 1 #25231

Closed

4 tasks

Make slow

b110de7

younesbelkada approved these changes Oct 24, 2023

View reviewed changes

ArthurZucker approved these changes Oct 24, 2023

View reviewed changes

tests/trainer/test_trainer_seq2seq.py Outdated Show resolved Hide resolved

tests/trainer/test_trainer_seq2seq.py Outdated Show resolved Hide resolved

src/transformers/trainer.py Show resolved Hide resolved

Apply suggestions from code review

a56d124

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

muellerzr requested a review from amyeroberts November 1, 2023 18:08

Clean

1864649

amyeroberts reviewed Nov 2, 2023

View reviewed changes

Simplify and just use gather

c4c35f0

muellerzr requested a review from amyeroberts November 2, 2023 16:34

amyeroberts reviewed Nov 3, 2023

View reviewed changes

src/transformers/trainer_seq2seq.py Outdated Show resolved Hide resolved

tests/trainer/test_trainer_seq2seq.py Outdated Show resolved Hide resolved

muellerzr and others added 2 commits November 3, 2023 11:56

Update tests/trainer/test_trainer_seq2seq.py

6953de9

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

gather always for seq2seq

ea9d452

amyeroberts approved these changes Nov 6, 2023

View reviewed changes

muellerzr merged commit 067c4a3 into main Nov 14, 2023
21 checks passed

muellerzr deleted the muellerzr-seq2seq branch November 14, 2023 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have seq2seq just use gather #27025

Have seq2seq just use gather #27025

muellerzr commented Oct 23, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 23, 2023 •

edited

Loading

muellerzr commented Oct 23, 2023 •

edited

Loading

younesbelkada left a comment

ArthurZucker left a comment

amyeroberts left a comment

amyeroberts Nov 2, 2023

muellerzr Nov 2, 2023

muellerzr commented Nov 2, 2023

amyeroberts left a comment

muellerzr commented Nov 3, 2023 •

edited

Loading

amyeroberts left a comment

psychocosine commented May 13, 2024

Have seq2seq just use gather #27025

Have seq2seq just use gather #27025

Conversation

muellerzr commented Oct 23, 2023 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Oct 23, 2023 • edited Loading

muellerzr commented Oct 23, 2023 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Nov 2, 2023

Choose a reason for hiding this comment

muellerzr Nov 2, 2023

Choose a reason for hiding this comment

muellerzr commented Nov 2, 2023

amyeroberts left a comment

Choose a reason for hiding this comment

muellerzr commented Nov 3, 2023 • edited Loading

amyeroberts left a comment

Choose a reason for hiding this comment

psychocosine commented May 13, 2024

muellerzr commented Oct 23, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 23, 2023 •

edited

Loading

muellerzr commented Oct 23, 2023 •

edited

Loading

muellerzr commented Nov 3, 2023 •

edited

Loading