-
Notifications
You must be signed in to change notification settings - Fork 1.5k
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🚀 Feature: Use S3- bucket as the vector store #724
Comments
@jolo-dev I see you're assigned to this issue. Is that because you created it, or are you currently working on it? If you're not working on it, I'd love to take this issue. Thanks! Happy coding! 😊 |
@jaredbradley243 thanks for your interest. |
That's very kind, but If you're already working on it, keep going! 😃 |
@jaredbradley243 No no. Really. Let me be your reviewer then ;) |
Hahah. It's gonna take me a bit of time to work on this. I'll need to re-familiarize myself with the codebase and I won't have time to get started for a week or two, but if everyone is alright with waiting, I'll happily take it off your hands! |
@jaredbradley243 No worries! |
@jaredbradley243 Any update on this ???? |
Hey @Rajesh983. Sorry for the delay, I just saw your comment! I have updated the script to allow S3 to be used as document and vector storage. If AWS credentials are detected in the .env file, the script will download documents from a given S3 bucket/folder and parse the documents. Once the documents are parsed, the resulting index.faiss and index.pkl files are saved back into the S3 bucket in the folder of the user's choosing. However, I still need to implement AWS role assumption. I had to take a break as I have an exam coming up, but I should finish the script soon! If you'd like to preview my code and test it out early, let me know. |
Hey quick question, I need this feature and since it isn't out yet I am planning on building it out myself just for my own use-case. Do I need to save the pickle file locally first before uploading to S3? or is there a way to write the langchain.vectorstores.faiss.FAISS object (the vector store) straight into S3 as a pickle file? Apologies for my naïveté. |
Hey @fundmatch-dev! I finished this feature yesterday, I'm just writing a readme for it! Would you like to test it out for me? |
Hi @jaredbradley243 I just finished implementing it manually for myself, it seems to work. But hey I don't mind helping if you can just explain what I need to do! Want to hop on a call or? |
I'm happy to hop on a call with you tomorrow, if you're free! (It's 8PM here in Los Angeles). In the meantime, you can replace your And here are some instructions: Script Functionality
Enabling S3 StorageTo enable S3 storage, use the
Enabling Role AssumptionIf accessing an S3 bucket requires assuming an IAM role (e.g., for cross-account access), the script supports this through the
This configuration allows the script to assume Note
|
Let me know if you have any difficulty, or if you find the instructions difficult to follow! 😁 This seems to be a sought after feature, I'm glad I got the change to work on it! |
Hey @jaredbradley243, |
Thank you! Over excitement I blame on the holiday season. 😂 Issue reopened. |
Folks: What is the ETA of this feature Completion? This would allow stand-alone conversion of S3 documents into vector version right? Will we have a separate index/id for each document after the conversion? Trying to wrap head around it |
Hi, I am trying to store my FAISS vectorstore in Azure blob storage. Is there any functionality present that can help me with that. |
🔖 Feature description
The user should be able to add an S3 bucket for storing and accessing their documents.
🎤 Why is this feature needed ?
The documents are not just stored in the cloud but are also easier to share.
That would reduce the storage usage on your hard drive.
✌️ How do you aim to achieve this?
In order to store documents in an S3, you can pass a variable
S3_STORE=my-bucket-name
via the.env
file. However, if you are running the application on your local machine, you will need to provide AWS credentials. The good news is that you can choose how to provide these credentials: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.htmlWhen running
scripts
, the result should be uploaded to the given S3-bucket.The store in the application should access the documents from the S3-bucket.
It could look like this
🔄️ Additional Information
No response
👀 Have you spent some time to check if this feature request has been raised before?
Are you willing to submit PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered: