Speeding up computation while using SPMD on large TPU pod #7987

dudulightricks · 2024-09-10T07:59:14Z

❓ Questions and Help

When running on vp-128 TPU pod (even when sharding only by batch dimension) we are experiencing very low performance comparing to the same pod without SPMD.

Do you have any tips how to increase the performance? some SPMD arguments? things we need to think about when using it? anything that might help because right now the performance is lower than regular in a factor.
@JackCaoG

JackCaoG · 2024-09-10T18:15:24Z

do you have a profile(xplane file) you can share? it is hard to guess what's happening without looking at the profile.

giuliano-97 · 2024-09-16T08:43:54Z

@JackCaoG I've been trying to fine-tune Gemma-2 9B on v4 / v5 pods with FSDP + SPMD using HF transfomers and torch XLA and I also have the feeling that training is slow, do you have some benchmarks on training LLMs with the same setup?

JackCaoG · 2024-09-16T23:44:24Z

replied in the other thread.

ayukh mentioned this issue Sep 16, 2024

Distributed training on multi-host v4/v5 TPU Pods is too slow #8020

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speeding up computation while using SPMD on large TPU pod #7987

Speeding up computation while using SPMD on large TPU pod #7987

dudulightricks commented Sep 10, 2024

JackCaoG commented Sep 10, 2024

giuliano-97 commented Sep 16, 2024

JackCaoG commented Sep 16, 2024

Speeding up computation while using SPMD on large TPU pod #7987

Speeding up computation while using SPMD on large TPU pod #7987

Comments

dudulightricks commented Sep 10, 2024

❓ Questions and Help

JackCaoG commented Sep 10, 2024

giuliano-97 commented Sep 16, 2024

JackCaoG commented Sep 16, 2024