Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Cuda out of memory issue in model.encode by allowing user to transfer to cpu #1717

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Quetzalcohuatl
Copy link
Contributor

In issue #487 and issue #522, users were running into OOM issues when batch size is large, because the embeddings aren't offloaded onto cpu.

The PR that fixed this only fixes it if convert_to_numpy. That means if you have convert_to_numpy=False, then your problem still exists.

In this PR, I just added an extra flag that allows the embeddings to be offloaded to cpu. This gives the user the flexibility to save the embeddings (for example if they are saving the SentenceTransformer embeddings to disk or keeping them in RAM for knn, which is often the case) instead of keeping all the embeddings on the gpu.

previously cuda oom issue was only solved if you had convert_to_numpy=True. This is a generalized fix.
….encode

fix cuda oom issue for model.encode
@paolorechia
Copy link

Hey, thanks for opening this, I'm facing this OOM issue. I'll checkout this PR and give it a try.

@paolorechia
Copy link

Hey, I tried using this PR, however, I ran into the following error:

    con.model.fit(
  File "/home/paolo/dev/openimagegenius/gpu-code/gpu-node/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 749, in fit
    self._eval_during_training(evaluator, output_path, save_best_model, epoch, -1, callback)
  File "/home/paolo/dev/openimagegenius/gpu-code/gpu-node/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 781, in _eval_during_training
    score = evaluator(self, output_path=eval_path, epoch=epoch, steps=steps)
  File "/home/paolo/dev/openimagegenius/gpu-code/gpu-node/lib/python3.10/site-packages/sentence_transformers/evaluation/EmbeddingSimilarityEvaluator.py", line 77, in __call__
    embeddings1 = model.encode(self.sentences1, batch_size=self.batch_size, show_progress_bar=self.show_progress_bar, convert_to_numpy=True)
  File "/home/paolo/dev/openimagegenius/gpu-code/gpu-node/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 201, in encode
    all_embeddings = np.asarray([emb.numpy() for emb in all_embeddings])
  File "/home/paolo/dev/openimagegenius/gpu-code/gpu-node/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 201, in <listcomp>
    all_embeddings = np.asarray([emb.numpy() for emb in all_embeddings])
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Here's how I'm calling the encode function

    def encode(self, X):
        return self.model.encode(X, convert_to_numpy=True, transfer_to_cpu=True)

@Quetzalcohuatl
Copy link
Contributor Author

@paolorechia It looks like the problem is in the file /home/paolo/dev/openimagegenius/gpu-code/gpu-node/lib/python3.10/site-packages/sentence_transformers/evaluation/EmbeddingSimilarityEvaluator.py specifically on Line 77. You need to edit it to have transfer_to_cpu=True, because by default it's False.

Here is an example of it working correctly.
image

@paolorechia
Copy link

Hey, @Quetzalcohuatl thanks for the advice. Fortunately, I managed to solve my problem in another way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants