Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can autofaiss take spark dataframe as input? #163

Open
vikasd22 opened this issue May 24, 2023 · 3 comments
Open

Can autofaiss take spark dataframe as input? #163

vikasd22 opened this issue May 24, 2023 · 3 comments

Comments

@vikasd22
Copy link

Hi,
I am trying to parse a spark dataframe as input for the distributed case as follows:

index, index_infos = build_index(
    embeddings=sdf,
    embedding_column= "embeddings",
    distributed="pyspark",
    temporary_indices_folder="/tmp/faiss",
)

Is it possible?

@hitchhicker
Copy link
Contributor

Hey!

Good question, It is not supported unfortunately.

@vikasd22
Copy link
Author

vikasd22 commented May 25, 2023

@hitchhicker Okay. Do you have recommendations for dealing more than memory embeddings. I have a spark dataframe and I could do sdf.toPandas().to_numpy() but it is probably gonna get killed if big data situations.

@rom1504
Copy link
Contributor

rom1504 commented May 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants