Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to distribute different GPUs for some large models #43

Open
liuyugeng opened this issue Mar 22, 2022 · 1 comment
Open

How to distribute different GPUs for some large models #43

liuyugeng opened this issue Mar 22, 2022 · 1 comment

Comments

@liuyugeng
Copy link

Hi, I am trying to use VGG to distill the images. But the gradient is too large to run the program. It will cost 38GB of the GPU memory to distill 10 images for Cifar10. Note that I just use one model for the distillation so the method in the advanced.md doesn't work under this situation. Many thanks! Could you provide some solutions for that

Best,
Yugeng

@ssnl
Copy link
Owner

ssnl commented Mar 22, 2022

Conceptually a couple strategies can be used:

  1. distributed different steps to different GPUs
  2. use gradient checkpointing to recompute early steps' graphs rather than storing them.

Neither is directly support by the provided code so they require additional efforts to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants