Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch compile support for distributed operations #1146

Open
AugustDev opened this issue Sep 13, 2024 · 2 comments
Open

Torch compile support for distributed operations #1146

AugustDev opened this issue Sep 13, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@AugustDev
Copy link

🚀 Feature

Documentation says that torch compile is not supported over distributed training right now. Since torch compile can speed up training as much as 2x using Lightning Trainer is without compile is no longer cost efficient and it would be great to support it.

It's bit unclear to me what happens if I compile the model before passing to Lightning module, will it be used as compiled model over DDP or not?

@AugustDev AugustDev added the enhancement New feature or request label Sep 13, 2024
@t-vi
Copy link
Collaborator

t-vi commented Sep 13, 2024

@AugustDev Thank you, did you want to file this here or with https://github.com/Lightning-AI/pytorch-lightning/issues ?

@awaelchli
Copy link
Member

The approach we took in Fabric should be transferrable to Trainer as well:
Lightning-AI/pytorch-lightning#19280
Lightning-AI/pytorch-lightning#19382
Essentially, it is just ensuring that torch.compile is applied over the FSDP/DDP wrapped model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants