You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running on vp-128 TPU pod (even when sharding only by batch dimension) we are experiencing very low performance comparing to the same pod without SPMD.
Do you have any tips how to increase the performance? some SPMD arguments? things we need to think about when using it? anything that might help because right now the performance is lower than regular in a factor. @JackCaoG
The text was updated successfully, but these errors were encountered:
@JackCaoG I've been trying to fine-tune Gemma-2 9B on v4 / v5 pods with FSDP + SPMD using HF transfomers and torch XLA and I also have the feeling that training is slow, do you have some benchmarks on training LLMs with the same setup?
❓ Questions and Help
When running on vp-128 TPU pod (even when sharding only by batch dimension) we are experiencing very low performance comparing to the same pod without SPMD.
Do you have any tips how to increase the performance? some SPMD arguments? things we need to think about when using it? anything that might help because right now the performance is lower than regular in a factor.
@JackCaoG
The text was updated successfully, but these errors were encountered: