Torch compile support for distributed operations #1146

AugustDev · 2024-09-13T16:44:17Z

🚀 Feature

Documentation says that torch compile is not supported over distributed training right now. Since torch compile can speed up training as much as 2x using Lightning Trainer is without compile is no longer cost efficient and it would be great to support it.

It's bit unclear to me what happens if I compile the model before passing to Lightning module, will it be used as compiled model over DDP or not?

t-vi · 2024-09-13T18:58:47Z

@AugustDev Thank you, did you want to file this here or with https://github.com/Lightning-AI/pytorch-lightning/issues ?

awaelchli · 2024-09-16T14:20:09Z

The approach we took in Fabric should be transferrable to Trainer as well:
Lightning-AI/pytorch-lightning#19280
Lightning-AI/pytorch-lightning#19382
Essentially, it is just ensuring that torch.compile is applied over the FSDP/DDP wrapped model.

AugustDev added the enhancement New feature or request label Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Torch compile support for distributed operations #1146

Torch compile support for distributed operations #1146

AugustDev commented Sep 13, 2024

t-vi commented Sep 13, 2024 •

edited

Loading

awaelchli commented Sep 16, 2024

Torch compile support for distributed operations #1146

Torch compile support for distributed operations #1146

Comments

AugustDev commented Sep 13, 2024

🚀 Feature

t-vi commented Sep 13, 2024 • edited Loading

awaelchli commented Sep 16, 2024

t-vi commented Sep 13, 2024 •

edited

Loading