[PyTorch] AOTI: add minimal arrayref interface #112800

swolchok · 2023-11-02T22:13:34Z

Stack from ghstack (oldest at bottom):

This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef for the inputs, but found that multi-dimensional
ArrayRefTensor ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: D50699890

NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on Phabricator!

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)! [ghstack-poisoned]

pytorch-bot · 2023-11-02T22:13:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112800

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 796da43 with merge base de4b2e5 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)! ghstack-source-id: 206327561 Pull Request resolved: #112800

This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)! [ghstack-poisoned]

Pull Request resolved: #112800 This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. ghstack-source-id: 206414952 @exported-using-ghexport Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

chenyang78 · 2023-11-04T08:32:31Z

torch/_inductor/codegen/aoti_runtime/interface.cpp

@@ -293,5 +291,137 @@ struct ThreadLocalCachedOutputTensor<ArrayRefTensor<T>> {
  RAIIAtenTensorHandle tensor_;
 };

+template <typename T>


Wondering if we should move all the codes below to another place, e.g. torch/csrc/inductor/aoti_torch/tensor_converter.cpp. interface.cpp is supposed to contain the functionalities used by the user of the generated DSO. Although we don't export the structs and functions, it still looks a bit strange to keep them here.

I need a place that is inside the DSO. AIUI we don't actually support compiling more than one .cpp file into the DSO right now. I suppose we can add another .cpp file that is copy/pasted into the generated file like interface.cpp if you like.

chenyang78 · 2023-11-04T08:42:17Z

torch/_inductor/codegen/wrapper.py

+            dtype = may_get_constant_buffer_dtype(input)
+            assert (
+                dtype is not None
+            ), f"Failed to get the dtype of sympy.Expr: {graph_input}"


perhaps s/graph_input/input/ ?

chenyang78 · 2023-11-04T08:50:27Z

torch/_inductor/codegen/wrapper.py

+
+                self.suffix.splice(
+                    """
+                    extern "C" AOTIRuntimeError AOTInductorModelRunMinimalArrayrefInterface(


Wondering if it would be possible to put the interface API into interface.h/interface.cpp? Because these are user-facing API, it seems to be better to me if we could make them explicit in the actual interface file instead of relying on codegen. We could codegen macro to disable the these MinimalArrayrefInterface based on config.use_minimal_arrayref_interface.

this implementation needs to come after the definition of AOTInductorModel{Inputs,Outputs}, which is (and must be) in the generated code.

Can't we just #include a file with the needed boilerplate? We do that in a few other places.

I suppose so, but it's like 10 lines that would be entirely out of context, and IMO that would be a net decrease in readability and maintainability.

chenyang78 · 2023-11-04T09:02:14Z

torch/csrc/inductor/aoti_torch/c/shim.h

@@ -276,6 +276,29 @@ AOTI_TORCH_EXPORT AOTITorchError aoti_torch_proxy_executor_call_function(

 #ifdef __cplusplus
 } // extern "C"
+
+template <typename T>
+int32_t aoti_torch_dtype();


It seems to be a little strange to put templates in a C interface. Could we use C-style function like:

inline int32_t aoti_torch_dtype_float() { return aoti_torch_dtype_float32; } ...

We already have those functions. I need a template.

(the usage is in the internal diff)

This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)! cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Pull Request resolved: #112800 This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. ghstack-source-id: 206636549 @exported-using-ghexport Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

chenyang78 · 2023-11-07T07:46:35Z

torch/_inductor/codegen/wrapper.py

+                        convert_handles_to_inputs(input_handles, inputs);
+                        auto outputs = run_impl_minimal_arrayref_interface<AOTInductorModelInputs, AOTInductorModelOutputs>(
+                            inputs, stream, proxy_executor);
+                        // NOTE: outputs is full of ArrayRef to thread_local storage. If in the future we need this


This seems to be different from the semantics of outputs in the regular path, where we transfer ownership of output tensors to the user code.

Yes, it is different. I do not think that that is a problem; changing the way the inputs and outputs are handled is the point of changing the interface.

chenyang78 · 2023-11-07T08:07:26Z

torch/_inductor/codegen/wrapper.py

+                    """
+                    extern "C" AOTIRuntimeError AOTInductorModelRunMinimalArrayrefInterface(
+                        AOTInductorModelHandle model_handle,
+                        const AOTInductorModelInputs& inputs,


AOTInductorModelInputs is defined to be a std::tuple. Is it fine to pass a C++ construct across the C interface boundary? In particular, because the compiler for compiling the interface code may not be the same as the one used for compiling the user code that invokes the interface, would this impose any potential C++ ABI issue like different calling conventions?

the compiler for compiling the interface code may not be the same as the one used for compiling the user code that invokes the interface, would this impose any potential C++ ABI issue like different calling conventions?

the compiler probably doesn't need to match, but the C++ standard library version certainly does. I wouldn't expect gcc vs clang problems unless somebody had gcc configured to use libstdc++ and clang configured to use libc++, but using different major versions of MSVC would be an issue.

The tuple enables efficient straightforward implementation of convert_outputs_to_handles and convert_handles_to_inputs. We could address this theoretical ABI compatibility concern by copying into a struct at the interface boundary, but I'm not excited about spending additional nanoseconds doing so unless you can provide an example of a realistic setup where we need to worry about this.

different major versions of MSVC would be an issue

apparently hasn't been an issue for 8 years: https://learn.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=msvc-170

torch/_inductor/config.py

This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)! cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

Pull Request resolved: #112800 This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. ghstack-source-id: 206745906 @exported-using-ghexport Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

swolchok · 2023-11-07T21:09:08Z

functorch failure doesn't repro internally either on the base of this diff or on this diff, so I'm assuming it's unrelated noise coming from whatever rebasing happened during export

This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)! cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler [ghstack-poisoned]

facebook-github-bot · 2023-12-13T12:02:48Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-12-13T12:06:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

) We currently have no shape checking on CPU IIUC. Now we at least do numel checking for the minimal arrayref interface. Differential Revision: [D51165703](https://our.internmc.facebook.com/intern/diff/D51165703/) Pull Request resolved: #113577 Approved by: https://github.com/chenyang78, https://github.com/jansel ghstack dependencies: #112800

Knocks off a few nanoseconds from CPU inference due to not having to set this field; paths that would've needed it are expensive anyway. Differential Revision: [D51182794](https://our.internmc.facebook.com/intern/diff/D51182794/) Pull Request resolved: #113578 Approved by: https://github.com/khabinov, https://github.com/Neilblaze ghstack dependencies: #112800, #113577

This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)! Pull Request resolved: pytorch#112800 Approved by: https://github.com/chenyang78

…rch#113577) We currently have no shape checking on CPU IIUC. Now we at least do numel checking for the minimal arrayref interface. Differential Revision: [D51165703](https://our.internmc.facebook.com/intern/diff/D51165703/) Pull Request resolved: pytorch#113577 Approved by: https://github.com/chenyang78, https://github.com/jansel ghstack dependencies: pytorch#112800

Knocks off a few nanoseconds from CPU inference due to not having to set this field; paths that would've needed it are expensive anyway. Differential Revision: [D51182794](https://our.internmc.facebook.com/intern/diff/D51182794/) Pull Request resolved: pytorch#113578 Approved by: https://github.com/khabinov, https://github.com/Neilblaze ghstack dependencies: pytorch#112800, pytorch#113577

This implements an optional alternate interface to the AOTI generated DSO, intended to increase efficiency for models running on CPU and requiring minimal overhead. See comment in config.py for more explanation. This took a while to get right (e.g., I initially required 1-D MiniArrayRef<T> for the inputs, but found that multi-dimensional ArrayRefTensor<T> ended up simplifying the implementation and allowed test_aot_inductor.py to run) and is somewhat intricate, so I am anticipating that review will require some back-and-forth. Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/) **NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)! Pull Request resolved: pytorch#112800 Approved by: https://github.com/chenyang78

…rch#113577) We currently have no shape checking on CPU IIUC. Now we at least do numel checking for the minimal arrayref interface. Differential Revision: [D51165703](https://our.internmc.facebook.com/intern/diff/D51165703/) Pull Request resolved: pytorch#113577 Approved by: https://github.com/chenyang78, https://github.com/jansel ghstack dependencies: pytorch#112800

Knocks off a few nanoseconds from CPU inference due to not having to set this field; paths that would've needed it are expensive anyway. Differential Revision: [D51182794](https://our.internmc.facebook.com/intern/diff/D51182794/) Pull Request resolved: pytorch#113578 Approved by: https://github.com/khabinov, https://github.com/Neilblaze ghstack dependencies: pytorch#112800, pytorch#113577

github-actions bot added module: inductor ciflow/inductor labels Nov 2, 2023

swolchok requested review from jansel, desertfire and chenyang78 and removed request for jansel November 2, 2023 22:15

swolchok changed the title ~~[PyTorch] AOT: add minimal arrayref interface~~ [PyTorch] AOTI: add minimal arrayref interface Nov 2, 2023

swolchok added the topic: not user facing topic category label Nov 3, 2023

chenyang78 reviewed Nov 4, 2023

View reviewed changes

swolchok requested a review from chenyang78 November 6, 2023 23:04

chenyang78 reviewed Nov 7, 2023

View reviewed changes

swolchok requested a review from chenyang78 November 7, 2023 18:44

swolchok added 6 commits November 30, 2023 11:45

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 13, 2023

pytorchmergebot added the merging label Dec 13, 2023

pytorchmergebot added the Merged label Dec 13, 2023

pytorchmergebot closed this in f9cf6ae Dec 13, 2023

pytorchmergebot removed the merging label Dec 13, 2023

facebook-github-bot deleted the gh/swolchok/601/head branch December 16, 2023 15:26

swolchok mentioned this pull request Apr 12, 2024

[AOTI] Multiple arrayref_interface tests cause segfault on test runner exit #123691

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PyTorch] AOTI: add minimal arrayref interface #112800

[PyTorch] AOTI: add minimal arrayref interface #112800

swolchok commented Nov 2, 2023 •

edited

Loading

pytorch-bot bot commented Nov 2, 2023 •

edited

Loading

chenyang78 Nov 4, 2023

swolchok Nov 6, 2023

chenyang78 Nov 4, 2023

chenyang78 Nov 4, 2023

swolchok Nov 6, 2023

jansel Nov 7, 2023

swolchok Nov 7, 2023 •

edited

Loading

chenyang78 Nov 4, 2023

swolchok Nov 6, 2023

swolchok Nov 6, 2023

chenyang78 Nov 7, 2023

swolchok Nov 7, 2023

chenyang78 Nov 7, 2023

swolchok Nov 7, 2023

swolchok Nov 7, 2023 •

edited

Loading

swolchok Nov 7, 2023

swolchok commented Nov 7, 2023

facebook-github-bot commented Dec 13, 2023

pytorchmergebot commented Dec 13, 2023

[PyTorch] AOTI: add minimal arrayref interface #112800

[PyTorch] AOTI: add minimal arrayref interface #112800

Conversation

swolchok commented Nov 2, 2023 • edited Loading

pytorch-bot bot commented Nov 2, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112800

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swolchok Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swolchok Nov 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

swolchok commented Nov 7, 2023

facebook-github-bot commented Dec 13, 2023

pytorchmergebot commented Dec 13, 2023

Merge started

swolchok commented Nov 2, 2023 •

edited

Loading

pytorch-bot bot commented Nov 2, 2023 •

edited

Loading

swolchok Nov 7, 2023 •

edited

Loading

swolchok Nov 7, 2023 •

edited

Loading