-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running step2 with: torch.cuda.OutOfMemoryError: CUDA out of memory #48
Comments
@tianqingyu Hi, currently, only |
Then, how many GPU satisfied? @tianqingyu |
My server has 4 video cards (4090), but I don't have the ability to change the code to GPU parallelism at the moment. I think 4 GPUs with 24Gx4 should be enough |
@tianqingyu Hi, I suggest you modify the |
Do you still plan to do this? I'm a bit confused here. Is half precision referring to loading the model and latents in float16? I'm also a bit confused why the returned latents from the base are in [1, 4, 16, 40, 64] but the interpolation model is in [2, 4, 61, 64, 40]. Is there a reason for switching the height and width, cutting 3 of the frames, and concatenating it to itself? edit: Also looks like the original issue is in the vae call. Try adding |
My video card is rtx4090, 24G VRAM
System is ubuntu 22
Here is the error message:
args.input_path = ../results/base/a_panda_taking_a_selfie,_2k,_high_quality.mp4
args.prompt = ['a_panda_taking_a_selfie,_2k,_high_quality']
loading video from ../results/base/a_panda_taking_a_selfie,_2k,high_quality.mp4
Traceback (most recent call last):
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 307, in
main(**OmegaConf.load(args.config))
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 279, in main
video_clip = auto_inpainting_copy_no_mask(args, video_input, prompt, vae, text_encoder, diffusion, model, device,)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/apps/vchitect-lavie/interpolation/sample.py", line 142, in auto_inpainting_copy_no_mask
video_input = vae.encode(video_input).latent_dist.sample().mul(0.18215)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
return method(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/autoencoder_kl.py", line 164, in encode
h = self.encoder(x)
^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/vae.py", line 129, in forward
sample = down_block(sample)
^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/unet_2d_blocks.py", line 1014, in forward
hidden_states = resnet(hidden_states, temb=None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/vantage/miniconda3/envs/lavie/lib/python3.11/site-packages/diffusers/models/resnet.py", line 599, in forward
output_tensor = (input_tensor + hidden_states) / self.output_scale_factor
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.77 GiB (GPU 0; 23.65 GiB total capacity; 18.59 GiB already allocated; 4.41 GiB free; 18.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
After testing, except for the memory overflow when running interpolation, both base and vsr can run normally.
The text was updated successfully, but these errors were encountered: