Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PyTorch] AOTI: add minimal arrayref interface #112800

Closed
wants to merge 19 commits into from

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Nov 2, 2023

Stack from ghstack (oldest at bottom):

This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef for the inputs, but found that multi-dimensional
ArrayRefTensor ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: D50699890

NOTE FOR REVIEWERS: This PR has internal Meta-specific changes or comments, please review them on Phabricator!

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 2, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112800

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 796da43 with merge base de4b2e5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

swolchok added a commit that referenced this pull request Nov 2, 2023
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

ghstack-source-id: 206327561
Pull Request resolved: #112800
@swolchok swolchok requested review from jansel, desertfire and chenyang78 and removed request for jansel November 2, 2023 22:15
@swolchok swolchok changed the title [PyTorch] AOT: add minimal arrayref interface [PyTorch] AOTI: add minimal arrayref interface Nov 2, 2023
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 3, 2023
Pull Request resolved: #112800

This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.
ghstack-source-id: 206414952
@exported-using-ghexport

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!
@swolchok swolchok added the topic: not user facing topic category label Nov 3, 2023
@@ -293,5 +291,137 @@ struct ThreadLocalCachedOutputTensor<ArrayRefTensor<T>> {
RAIIAtenTensorHandle tensor_;
};

template <typename T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should move all the codes below to another place, e.g. torch/csrc/inductor/aoti_torch/tensor_converter.cpp. interface.cpp is supposed to contain the functionalities used by the user of the generated DSO. Although we don't export the structs and functions, it still looks a bit strange to keep them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a place that is inside the DSO. AIUI we don't actually support compiling more than one .cpp file into the DSO right now. I suppose we can add another .cpp file that is copy/pasted into the generated file like interface.cpp if you like.

dtype = may_get_constant_buffer_dtype(input)
assert (
dtype is not None
), f"Failed to get the dtype of sympy.Expr: {graph_input}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps s/graph_input/input/ ?


self.suffix.splice(
"""
extern "C" AOTIRuntimeError AOTInductorModelRunMinimalArrayrefInterface(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if it would be possible to put the interface API into interface.h/interface.cpp? Because these are user-facing API, it seems to be better to me if we could make them explicit in the actual interface file instead of relying on codegen. We could codegen macro to disable the these MinimalArrayrefInterface based on config.use_minimal_arrayref_interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this implementation needs to come after the definition of AOTInductorModel{Inputs,Outputs}, which is (and must be) in the generated code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just #include a file with the needed boilerplate? We do that in a few other places.

Copy link
Contributor Author

@swolchok swolchok Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose so, but it's like 10 lines that would be entirely out of context, and IMO that would be a net decrease in readability and maintainability.

@@ -276,6 +276,29 @@ AOTI_TORCH_EXPORT AOTITorchError aoti_torch_proxy_executor_call_function(

#ifdef __cplusplus
} // extern "C"

template <typename T>
int32_t aoti_torch_dtype();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be a little strange to put templates in a C interface. Could we use C-style function like:

inline int32_t aoti_torch_dtype_float() {
   return aoti_torch_dtype_float32;
}
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have those functions. I need a template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(the usage is in the internal diff)

This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 6, 2023
Pull Request resolved: #112800

This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.
ghstack-source-id: 206636549
@exported-using-ghexport

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!
convert_handles_to_inputs(input_handles, inputs);
auto outputs = run_impl_minimal_arrayref_interface<AOTInductorModelInputs, AOTInductorModelOutputs>(
inputs, stream, proxy_executor);
// NOTE: outputs is full of ArrayRef to thread_local storage. If in the future we need this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be different from the semantics of outputs in the regular path, where we transfer ownership of output tensors to the user code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is different. I do not think that that is a problem; changing the way the inputs and outputs are handled is the point of changing the interface.

"""
extern "C" AOTIRuntimeError AOTInductorModelRunMinimalArrayrefInterface(
AOTInductorModelHandle model_handle,
const AOTInductorModelInputs& inputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AOTInductorModelInputs is defined to be a std::tuple. Is it fine to pass a C++ construct across the C interface boundary? In particular, because the compiler for compiling the interface code may not be the same as the one used for compiling the user code that invokes the interface, would this impose any potential C++ ABI issue like different calling conventions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the compiler for compiling the interface code may not be the same as the one used for compiling the user code that invokes the interface, would this impose any potential C++ ABI issue like different calling conventions?

the compiler probably doesn't need to match, but the C++ standard library version certainly does. I wouldn't expect gcc vs clang problems unless somebody had gcc configured to use libstdc++ and clang configured to use libc++, but using different major versions of MSVC would be an issue.

Copy link
Contributor Author

@swolchok swolchok Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tuple enables efficient straightforward implementation of convert_outputs_to_handles and convert_handles_to_inputs. We could address this theoretical ABI compatibility concern by copying into a struct at the interface boundary, but I'm not excited about spending additional nanoseconds doing so unless you can provide an example of a realistic setup where we need to worry about this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different major versions of MSVC would be an issue

apparently hasn't been an issue for 8 years: https://learn.microsoft.com/en-us/cpp/porting/binary-compat-2015-2017?view=msvc-170

torch/_inductor/config.py Outdated Show resolved Hide resolved
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
swolchok added a commit that referenced this pull request Nov 7, 2023
Pull Request resolved: #112800

This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.
ghstack-source-id: 206745906
@exported-using-ghexport

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!
@swolchok
Copy link
Contributor Author

swolchok commented Nov 7, 2023

functorch failure doesn't repro internally either on the base of this diff or on this diff, so I'm assuming it's unrelated noise coming from whatever rebasing happened during export

This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 aakhundov ColinPeppler

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 13, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Dec 13, 2023
)

We currently have no shape checking on CPU IIUC. Now we at least do numel checking for the minimal arrayref interface.

Differential Revision: [D51165703](https://our.internmc.facebook.com/intern/diff/D51165703/)

Pull Request resolved: #113577
Approved by: https://github.com/chenyang78, https://github.com/jansel
ghstack dependencies: #112800
pytorchmergebot pushed a commit that referenced this pull request Dec 13, 2023
Knocks off a few nanoseconds from CPU inference due to not having to set this field; paths that would've needed it are expensive anyway.

Differential Revision: [D51182794](https://our.internmc.facebook.com/intern/diff/D51182794/)

Pull Request resolved: #113578
Approved by: https://github.com/khabinov, https://github.com/Neilblaze
ghstack dependencies: #112800, #113577
@facebook-github-bot facebook-github-bot deleted the gh/swolchok/601/head branch December 16, 2023 15:26
guilhermeleobas pushed a commit to guilhermeleobas/pytorch that referenced this pull request Dec 18, 2023
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

Pull Request resolved: pytorch#112800
Approved by: https://github.com/chenyang78
guilhermeleobas pushed a commit to guilhermeleobas/pytorch that referenced this pull request Dec 18, 2023
…rch#113577)

We currently have no shape checking on CPU IIUC. Now we at least do numel checking for the minimal arrayref interface.

Differential Revision: [D51165703](https://our.internmc.facebook.com/intern/diff/D51165703/)

Pull Request resolved: pytorch#113577
Approved by: https://github.com/chenyang78, https://github.com/jansel
ghstack dependencies: pytorch#112800
guilhermeleobas pushed a commit to guilhermeleobas/pytorch that referenced this pull request Dec 18, 2023
Knocks off a few nanoseconds from CPU inference due to not having to set this field; paths that would've needed it are expensive anyway.

Differential Revision: [D51182794](https://our.internmc.facebook.com/intern/diff/D51182794/)

Pull Request resolved: pytorch#113578
Approved by: https://github.com/khabinov, https://github.com/Neilblaze
ghstack dependencies: pytorch#112800, pytorch#113577
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
This implements an optional alternate interface to the AOTI
generated DSO, intended to increase efficiency for models running on
CPU and requiring minimal overhead. See comment in config.py for more
explanation.

This took a while to get right (e.g., I initially required 1-D
MiniArrayRef<T> for the inputs, but found that multi-dimensional
ArrayRefTensor<T> ended up simplifying the implementation and allowed
test_aot_inductor.py to run) and is somewhat intricate, so I am
anticipating that review will require some back-and-forth.

Differential Revision: [D50699890](https://our.internmc.facebook.com/intern/diff/D50699890/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D50699890/)!

Pull Request resolved: pytorch#112800
Approved by: https://github.com/chenyang78
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
…rch#113577)

We currently have no shape checking on CPU IIUC. Now we at least do numel checking for the minimal arrayref interface.

Differential Revision: [D51165703](https://our.internmc.facebook.com/intern/diff/D51165703/)

Pull Request resolved: pytorch#113577
Approved by: https://github.com/chenyang78, https://github.com/jansel
ghstack dependencies: pytorch#112800
dmenig pushed a commit to dmenig/pytorch that referenced this pull request Dec 21, 2023
Knocks off a few nanoseconds from CPU inference due to not having to set this field; paths that would've needed it are expensive anyway.

Differential Revision: [D51182794](https://our.internmc.facebook.com/intern/diff/D51182794/)

Pull Request resolved: pytorch#113578
Approved by: https://github.com/khabinov, https://github.com/Neilblaze
ghstack dependencies: pytorch#112800, pytorch#113577
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants