Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Experimental) Make nvFuser executor treat ltorch.copy_ differently from prims.copy_ #1177

Conversation

shino16
Copy link
Contributor

@shino16 shino16 commented Sep 20, 2024

Fixes #1173. I implemented the idea mentioned in #1173 (comment). This PR makes the nvFuser executor absorb ltorch.copy_ separately from prims.copy_ (which appears within in-place ops).

This PR is experimental. It works (if the tests pass), but it is not a good solution in that some transform may walk in and replace ltorch.copy_ with something equivalent. A complete solution would be to track all fd.add_output generated so far and insert fd.ops.set where needed.

@t-vi
Copy link
Collaborator

t-vi commented Sep 20, 2024

Would it not be cleaner to have a separate prim for the inplace-resolving copy?

@shino16
Copy link
Contributor Author

shino16 commented Sep 20, 2024

Yes, it must be possible to prepare a separate primitive or give prims.copy_ a flag indicating whether it comes from an inplace op. When I was working on #912, Ivan told me that all our prims functions should be considered as part of public API). So I am not sure how much implementation details we can expose under prims.

@t-vi
Copy link
Collaborator

t-vi commented Sep 20, 2024

Yeah, I totally agree that we should not mess with prims too much, but to my mind, having a specialized, separate prim for the inplace is something very straightforward and likely better than playing tricks with the levels of decomposition.

@shino16
Copy link
Contributor Author

shino16 commented Sep 20, 2024

That sounds nice. I will close this PR for now and try the way you suggest. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PR #1110 nearly doubles the compilation & execution time of a copy-heavy program
2 participants