-
Notifications
You must be signed in to change notification settings - Fork 26.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable torchdynamo with torch_tensorrt(fx path) #17765
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Hi, @stas00 just a friendly ping. I updated the installation part and it will be easy to repro if needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was away, catching up now.
trying to figure out how to install TensorRT - so can't yet appreciate your PR. Meanwhile there is already work to do - please see comments.
Additionally the installation info you shared needs to be part of the docs, since otherwise the users won't know what to do. Of course, ideally it should be on your side and HF docs link to it. As that way you can easily keep it up-to-date. Thank you.
And of course the new API options need to be documented as well in the user-facing documentation otherwise nobody will use this feature. @anijain2305, could you please complete that long overdue documentation? We won't be able to finish this PR until the doc task from #17308 (comment) is completed. Thank you! You can of course do it as part of this PR and document the overdue options and the new ones in one go.
So I followed your instructions except I used the .deb package installer. (oh and please link to https://docs.nvidia.com/deeplearning/tensorrt/archives/index.html so that the user will know how to install tensorrt) why do I get:
Is it the module from oh I see it failed to build:
installed
so it's not a python package that it wants but a system-wide In any case this obviously requires explicit instructions. I will wait for your instructions before proceeding. |
Thanks for your time and efforts @stas00 !
Hope it solves your problem. |
That did the trick. The tests have run successfully. so let's update the OP with the above 2 fixes. |
ah, one more user-facing documentation nit - if you want users to use your magic code you will want to provide some enticement. A small benchmark table that shows what these features do usually goes a long way to get a user excited to try them. So this is something else to consider. It's not a show stopper, but as you can see if the docs aren't added right away they never get added, so it's best to do it in one go. It's still a recommendation and I'm fine merging it as is, it's just not going to be used much w/o enticing docs. |
I will try to add the doc there. But it is better to have @anijain2305 to include the AOT part.:-) |
Yeah, I was hoping that you'd only need to add the incremental part relevant for this PR. |
re: CI - yes and it's complicated basically the live CI that you see reporting in this PR runs only CPU tests since CircleCI doesn't have gpus. then we have another set of CI workflows that runs on our machine via github actions and that's where we test all the complex/slow cases. And yes, I completely forgot that part of this PR we need to setup our CI to install all these packages as well so that these tests will be run. So once we polished this let's not forget that part. We will have to run all those instructions on our pt-nightly docker image - but actually there is a problem with this idea - how will the docker builder be able to download tensorRT packages if they require an NVIDIA user account? |
re: CI
Do you think we need to have @require_torchtensorrt.fx ? So it will help us to check if torch_tensorrt.fx is installed in the test? |
That's great to know - thank you very much - I will pass this info on
Ah, right! so no need for nvidia user account! super - let's use that in the instructions then.
Absolutely, yes! |
@stas00 , just wondering if the circleci is flaky? Some tests errors are not related. For ex. |
It appears that the CI is very broken at the moment, I asked and will know more tomorrow morning. Thank you for the heads up, @frank-wei - it doesn't look like any of the failures are related to your work. Especially since the live CI won't run any of your tests. |
ok, so for the quality one - please rebase this PR on main. Thank you. The other issue I don't have an answer for yet. update: I rebased - let's see with the update. |
ok, so to fix after rebasing most of the CI failures are now coming from this PR:
Let me know if you need help with sorting it out. |
…ransformers into torchdynamo_fx2trt_2
One of the tests is failing for me:
let me know what details you need - this is on A100. oh, it actually crashed before that:
This is not great, shouldn't the test have failed here and not in a misleading later place of comparison? |
src/transformers/training_args.py
Outdated
self.ctx_manager_torchdynamo = contextlib.nullcontext() | ||
if self.torchdynamo: | ||
if not is_torchdynamo_available(): | ||
raise RuntimeError("Torchdynamo is not installed.") | ||
|
||
import torchdynamo | ||
from torchdynamo.optimizations import backends | ||
from torchdynamo.optimizations.training import aot_autograd_speedup_strategy | ||
|
||
if self.torchdynamo == "eager": | ||
self.ctx_manager_torchdynamo = torchdynamo.optimize("eager") | ||
elif self.torchdynamo == "nvfuser": | ||
self.ctx_manager_torchdynamo = torchdynamo.optimize(aot_autograd_speedup_strategy) | ||
elif self.torchdynamo == "fx2trt-fp16": | ||
if not is_torch_tensorrt_fx_available(): | ||
raise RuntimeError("Torch-TensorRT FX path is not installed.") | ||
self.ctx_manager_torchdynamo = torchdynamo.optimize(backends.fx2trt_compiler_fp16) | ||
elif self.torchdynamo == "fx2trt": | ||
if not is_torch_tensorrt_fx_available(): | ||
raise RuntimeError("Torch-TensorRT FX path is not installed.") | ||
self.ctx_manager_torchdynamo = torchdynamo.optimize(backends.fx2trt_compiler) | ||
else: | ||
raise RuntimeError(f"Torchdynamo backend {self.torchdynamo} is not supported.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.ctx_manager_torchdynamo = contextlib.nullcontext() | |
if self.torchdynamo: | |
if not is_torchdynamo_available(): | |
raise RuntimeError("Torchdynamo is not installed.") | |
import torchdynamo | |
from torchdynamo.optimizations import backends | |
from torchdynamo.optimizations.training import aot_autograd_speedup_strategy | |
if self.torchdynamo == "eager": | |
self.ctx_manager_torchdynamo = torchdynamo.optimize("eager") | |
elif self.torchdynamo == "nvfuser": | |
self.ctx_manager_torchdynamo = torchdynamo.optimize(aot_autograd_speedup_strategy) | |
elif self.torchdynamo == "fx2trt-fp16": | |
if not is_torch_tensorrt_fx_available(): | |
raise RuntimeError("Torch-TensorRT FX path is not installed.") | |
self.ctx_manager_torchdynamo = torchdynamo.optimize(backends.fx2trt_compiler_fp16) | |
elif self.torchdynamo == "fx2trt": | |
if not is_torch_tensorrt_fx_available(): | |
raise RuntimeError("Torch-TensorRT FX path is not installed.") | |
self.ctx_manager_torchdynamo = torchdynamo.optimize(backends.fx2trt_compiler) | |
else: | |
raise RuntimeError(f"Torchdynamo backend {self.torchdynamo} is not supported.") | |
if self.torchdynamo: | |
if not is_torchdynamo_available(): | |
raise RuntimeError("Torchdynamo is not installed.") | |
import torchdynamo | |
from torchdynamo.optimizations import backends | |
from torchdynamo.optimizations.training import aot_autograd_speedup_strategy | |
def get_ctx(): | |
# Normal | |
if self.torchdynamo == "eager": | |
return torchdynamo.optimize("eager") | |
elif self.torchdynamo == "nvfuser": | |
return torchdynamo.optimize(aot_autograd_speedup_strategy) | |
# TensorRT | |
if self.torchdynamo in ["fx2trt-fp16", "fx2trt"]: | |
if not is_torch_tensorrt_fx_available(): | |
raise RuntimeError("Torch-TensorRT FX path is not installed.") | |
if self.torchdynamo == "fx2trt-fp16": | |
return torchdynamo.optimize(backends.fx2trt_compiler_fp16) | |
elif self.torchdynamo == "fx2trt": | |
return torchdynamo.optimize(backends.fx2trt_compiler) | |
raise RuntimeError(f"Torchdynamo backend {self.torchdynamo} is not supported.") | |
self.ctx_manager_torchdynamo = get_ctx() | |
else: | |
self.ctx_manager_torchdynamo = contextlib.nullcontext() |
I've tried to make the very long ifelse a little bit easier to read and removing repetition. let me know if this resonates.
Untested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if the number of options continues to grow, since it's quite symmetric we can just make a dict look up table and look it up for the context and the requirements. But this is probably good enough for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the efforts of refactoring. I took it in the change.
The failure may due to the torchdynamo outdated? Could you install the newest torchdynamo? Here are the command to install it:
It looks good from my testing:
|
I suspected that was the case, but what I was trying to say is that the test should have failed on the torchdynamo error and not the mismatch in values, i.e. something is trapping the real error and the user could be not the wiser that their torchdynamo is broken - e.g. when there are a lot of logs. It needs to assert on the actual error. Does it make sense? |
hm.. that is something out of my expertise as it relates with torchdynamo. If it is torch_tensorrt related, I'd love to help. For the CI test error, it seems that test is flaky? I did not find useful any information. Could you help guide/triage that? Thanks. |
tests/trainer/test_trainer.py
Outdated
@@ -1846,7 +1846,6 @@ def test_torchdynamo_full_eval(self): | |||
|
|||
@require_torch_non_multi_gpu | |||
@require_torchdynamo | |||
@require_torch_tensorrt_fx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this requirement, since it's not being used. please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catching! This test does not use tensorrt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @frank-wei, I had to rebuild the whole environment against pt-nightly and now everything works.
I think it'd be good to save the instructions in the OP somewhere so that it's easier for the user and us to be able to rebuild the environment.
Would you like to maintain a section or a file on your side that contains the instructions in the OP and we could point to it?
Other than that, I will just ask Sylvain to have a quick review and we can merge this.
Thank you for your patience.
Thanks @stas00 , do you think I can add a 3 pointers for installations of torchdynamo, functorch, torch_tensorrt in docs/source/en/perf_train_gpu_one.mdx ? |
I think that works, @frank-wei |
Cool. Update finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR! I left some comment on where the context manager "getter" should live, as it shouldn't be an attribute of TrainingArguments
, a dataclass we serialize at each save.
src/transformers/training_args.py
Outdated
if self.torchdynamo: | ||
if not is_torchdynamo_available(): | ||
raise RuntimeError("Torchdynamo is not installed.") | ||
|
||
import torchdynamo | ||
from torchdynamo.optimizations import backends | ||
from torchdynamo.optimizations.training import aot_autograd_speedup_strategy | ||
|
||
def get_ctx(): | ||
# Normal | ||
if self.torchdynamo == "eager": | ||
return torchdynamo.optimize("eager") | ||
elif self.torchdynamo == "nvfuser": | ||
return torchdynamo.optimize(aot_autograd_speedup_strategy) | ||
# TensorRT | ||
if self.torchdynamo in ["fx2trt-fp16", "fx2trt"]: | ||
if not is_torch_tensorrt_fx_available(): | ||
raise RuntimeError("Torch-TensorRT FX path is not installed.") | ||
if self.torchdynamo == "fx2trt-fp16": | ||
return torchdynamo.optimize(backends.fx2trt_compiler_fp16) | ||
elif self.torchdynamo == "fx2trt": | ||
return torchdynamo.optimize(backends.fx2trt_compiler) | ||
else: | ||
raise RuntimeError(f"Torchdynamo backend {self.torchdynamo} is not supported.") | ||
|
||
self.ctx_manager_torchdynamo = get_ctx() | ||
else: | ||
self.ctx_manager_torchdynamo = contextlib.nullcontext() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big fan of having this as an attribute of the TrainingArguments
: I think it will break serialization (see here). This all could fit in a function that takes the value of self.dynamo
(since it's the only field of TrainingArguments
it uses) and lies in integrations.py
. The code in the trainer file should then be adapted slightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger, that sounds good to me. I moved the function to integrations.py
but has circular import issue.
File "/home/runner/work/transformers/transformers/src/transformers/training_args.py", line 26, in <module>
from .integrations import get_torchdynamo_ctx
File "/home/runner/work/transformers/transformers/src/transformers/integrations.py", line 47, in <module>
from .trainer_callback import ProgressCallback, TrainerCallback # noqa: E402
File "/home/runner/work/transformers/transformers/src/transformers/trainer_callback.py", line 27, in <module>
from .training_args import TrainingArguments
ImportError: cannot import name 'TrainingArguments' from partially initialized module 'transformers.training_args' (most likely due to a circular import) (/home/runner/work/transformers/transformers/src/transformers/training_args.py)
Is it good to leave function get_torchdynamo_ctx
as a member of TrainingArguments
? Or leave it in import_utils.py
to stay together with is_torchdynamo_available()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't be imported at all in the training_args.py
module, only in the trainer.py
. As I said, you shouldn't add new attributes to TrainingArguments
that are not serializable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sgugger, the original implementation calculated the context on every call - that's why I suggested to move the logic to the argrparse stage, since this logic needs to be done only once per program run.
What would be a good place then to perform this figuring out? In trainer's init probably, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@frank-wei, please let me know if you need help here - moving to trainer's init that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Way better this way, thanks for iterating! Left a last nit.
src/transformers/trainer.py
Outdated
if self.args.torchdynamo: | ||
if not is_torchdynamo_available(): | ||
raise RuntimeError("Torchdynamo is not installed.") | ||
import torchdynamo | ||
from torchdynamo.optimizations import backends | ||
from torchdynamo.optimizations.training import aot_autograd_speedup_strategy | ||
|
||
def get_ctx(): | ||
# Normal | ||
if self.args.torchdynamo == "eager": | ||
return torchdynamo.optimize("eager") | ||
elif self.args.torchdynamo == "nvfuser": | ||
return torchdynamo.optimize(aot_autograd_speedup_strategy) | ||
# TensorRT | ||
if self.args.torchdynamo in ["fx2trt-fp16", "fx2trt"]: | ||
if not is_torch_tensorrt_fx_available(): | ||
raise RuntimeError("Torch-TensorRT FX path is not installed.") | ||
if self.args.torchdynamo == "fx2trt-fp16": | ||
return torchdynamo.optimize(backends.fx2trt_compiler_fp16) | ||
elif self.args.torchdynamo == "fx2trt": | ||
return torchdynamo.optimize(backends.fx2trt_compiler) | ||
else: | ||
raise RuntimeError(f"Torchdynamo backend {self.args.torchdynamo} is not supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All self.args
-> args
, we've defined a shortcut :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Fixed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@stas00 Are you good with this last iteration (as long as all tests pass?) |
Let me run the tests. |
All tests pass. Good to merge once the CI is green. I created a new task #18127 to handle the CI requirements. |
* enable fx2trt * Update perf_train_gpu_one.mdx * Update perf_train_gpu_one.mdx * add lib check * update * format * update * fix import check * fix isort * improve doc * refactor ctx manager * fix isort * black format * isort fix * fix format * update args * update black * cleanups * Update perf_train_gpu_one.mdx * code refactor * code refactor to init * remove redundancy * isort * replace self.args with args Co-authored-by: Stas Bekman <stas@stason.org>
What does this PR do?
Adding support for TorchDynamo with torch_tensor(fx2trt module). Detailed context available at #17724
This diff is about adding extra inference backend based on #17308
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
To reproduce and set up the environment
cc HF @stas00
cc Meta @yinghai @Chillee
cc NV @ncomly-nvidia @narendasan