-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark Connector sends FloatArray as String #666
Comments
Hi @piotrkan 👋 Can you share a fuller example? What happens between your Node2Vec in write mode writes arrays of 32-bit floating point values. |
Hi @jjaderberg sorry for delayed reply! I reproduced the error in the following code - I now realize that the problem is not in writing the embeddings but reading them using pyspark and neo4j - do you have any idea what's causing the issue?
What this returns is the following DataFrame, where topological_embedding is a string not an array
|
@piotrkan I agree that your reproducer suggests something goes wrong reading the float array node properties via pyspark. I have raised it with the maintainers of the Neo4j Spark Connector. If you want to work around the problem you should be able to convert the float array property values to double array values with the
and for the label-based loading with pyspark and the Spark Connector you can overwrite the property with the type converted double array value
If you do this before reading the Person label with pyspark, it should succeed (assuming our hypothesis is correct). N.B. "float" in the function name "toFloatList" refers to Cypher floating point type, which is 64 bits. Node2Vec writes 32 bit floating point values for smaller memory footprint. Some languages use "float" for the 32-bit type and "double" for the 64-bit type. Neo4j supports storing both of these types and the Cypher runtime can also handle both. But in the Cypher language there is only a single floating point type and it is 64 bits. It's unfortunate the Node2Vec result didn't work to consume via pyspark, hopefully the Connector maintainers can confirm or debunk our hypothesis and address the problem if it is indeed missing support in the Connector. |
Describe the bug
I am using node2vec to generate graph embeddings for my graph. I use the following code and it works fine - it successfully saves the embeddings in the graph stored in neo4j under 'topological embeddings' property name
But when I am reading the graph in the next step when conducting dimensionality reduction (df corresponds to a spark dataframe with nodes and topological embeddings as cols):
I get the following error:
This is not happening when I use GraphSage and write the embeddings in the following format
Have you experienced anything like this? seems like the .write() function in the python client saves the embeddings as strings?
graphdatascience library version:
GDS plugin version: 2.7.0
Python version: 3.11
Neo4j version: 5.21.0
Operating system: macOS
The text was updated successfully, but these errors were encountered: