-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build_index take much more time when decreasing max_index_memory_usage #157
Comments
Hello @SingL3! The bottleneck is faiss here, the About the best setting of max_index_memory_usage, I would say you should use as much RAM as possible to get the best performance, so you should put 16GB or 32GB if you can. You can have a look at the recall metrics computed at the end of the construction of the index to make sure your index is as good as you want. (You have 190GB of data, you can safely compress to a factor 16 without losing too much quality if you use clip embeddings) I hope it helps! |
Hello @victor-paltz, I am using 12 cores and I am testing on only 50k embeddings which should not take that much time. So i think it is stuck. Actually, the reason that I ask the best setting of max_index_memory_usage is that a smaller one would result in a better compression ratio. And a better compression ratio means less disk usage and also in this case, less RAM. |
I just ran it on my computer (16 cores), and it took me 27 minutes to train the index. Have you tried to train it a second time?
For information, you can also keep your index on disk and still have good query performance. So if too much compression has an impact on the quality of your embeddings, you could use that solution too. |
Hi, @victor-paltz. I interrupted the processes that are stuck and they all seems like stunk at swigfaiss_avx2.py |
Hi, I first built index for 50000 x 512 embeddings using 4G max_index_memory_usage, which took about 90s.
Then I tried 50M max_index_memory_usage, and it took like 17+ hours(not finished yet).
Here is the log:
Is this working as expected?
BTW, do you have a suggestion for the best setting of max_index_memory_usage?
Thank you.
The text was updated successfully, but these errors were encountered: