Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix + Unify digraph and multidigraph behaviour #46

Merged

Conversation

jackboyla
Copy link
Contributor

I noticed some issues where trying to do ORDER BY on edge attributes. This was due to the data structure of the edges specifically because they can support multi edges.

You can reproduce the error by (this PR fixes that):

from grandcypher import GrandCypher
import networkx as nx

host = nx.MultiDiGraph()
host.add_node("a", name="Alice", age=25)
host.add_node("b", name="Bob", age=30)
host.add_node("c", name="Carol", age=20)
host.add_edge("b", "a", __labels__={"paid"}, value=14)
host.add_edge("a", "b", __labels__={"paid"}, value=9)
host.add_edge("a", "b", __labels__={"paid"}, amount=96)
host.add_edge("a", "b", __labels__={"paid"}, value=40)

qry = """
MATCH (n)-[r:paid]->()
RETURN n.name, r.value
ORDER BY r.value ASC
"""
res = GrandCypher(host).run(qry)

print(res)

'''
TypeError: '<' not supported between instances of 'dict' and 'dict'
'''

Internally, GrandCypher now treats all graphs as MultiDiGraphs. I think this will make things clearer and require less special handling for future updates.

I also made some changes to the unit tests and added some more for these problematic cases.

@@ -376,7 +376,7 @@ def _data_path_to_entity_name_attribute(data_path):

class _GrandCypherTransformer(Transformer):
def __init__(self, target_graph: nx.Graph, limit=None):
self._target_graph = target_graph
self._target_graph = nx.MultiDiGraph(target_graph)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a super smart change and simplifies a ton — good thinking! there's probably a ton of business logic we can strip out as a result... thinking out loud, maybe makes sense to put in a test coverage library to auto-detect those chunks...

any performance hits you can think of as a result of doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a test coverage library to auto-detect those chunks

that sounds like a great idea! Haven't used many test coverage libraries myself so open to suggestions :)

Also, w.r.t to performance hit I'm unsure about the impact of changing to MultiDiGraph -- at least in practice it appears to be similar. Probably a good idea to benchmark future versions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been liking codspeed (e.g., aplbrain/grand#48) — maybe a cool thing to extend to this repo someday!

@jackboyla jackboyla requested a review from j6k4m8 June 18, 2024 14:35
@jackboyla
Copy link
Contributor Author

jackboyla commented Jun 18, 2024

sorry to re-request a review, but I ended up making some more changes :)

I noticed that when using ORDER BY on an edge attribute, the data structure changes (this is now fixed):

from grandcypher import GrandCypher
import networkx as nx

host.add_node("a", name="Alice", age=25)
host.add_node("b", name="Bob", age=30)
host.add_node("c", name="Carol", age=20)
host.add_edge("b", "a", __labels__={"paid"}, value=14)
host.add_edge("a", "b", __labels__={"paid"}, value=9)
host.add_edge("a", "b", __labels__={"paid"}, value=40)

qry = """
MATCH (n)-[r]->()
RETURN n.name, r.value
ORDER BY r.value ASC
"""
res = GrandCypher(host).run(qry)

'''
res['r.value'] --> [
    [((0, 'paid'), 9), ((1, 'paid'), 40)], 
    [((0, 'paid'), 14)]
]

without `order by` it would be
res['r.value'] --> [
    {(0, 'paid'): 9, (1, 'paid'): None, (2, 'paid'): 40}, 
    {(0, 'paid'): 14}
]
'''

also I added support for aliases:

from grandcypher import GrandCypher
import networkx as nx

host = nx.MultiDiGraph()
host.add_node("a", name="Alice", age=25)
host.add_node("b", name="Bob", age=30)
host.add_node("c", name="Carol", age=20)
host.add_edge("b", "a", __labels__={"paid"}, value=14)
host.add_edge("a", "b", __labels__={"paid"}, value=9)
host.add_edge("a", "b", __labels__={"paid"}, amount=96)
host.add_edge("a", "b", __labels__={"paid"}, value=40)

qry = """
MATCH (n)-[r]->(m)
RETURN n.name, SUM(r.value) AS myvalue
ORDER BY myvalue DESC
"""

res = GrandCypher(host).run(qry)

print(res)

'''
{'n.name': ['Alice', 'Bob'], 'myvalue': [{'paid': 49}, {'paid': 14}]}
'''

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 18, 2024

Woahhh great improvements — always happy to re-review!

@j6k4m8
Copy link
Member

j6k4m8 commented Jun 18, 2024

This is great @jackboyla — good to merge?

@jackboyla
Copy link
Contributor Author

cheers! good to merge😊

@j6k4m8 j6k4m8 merged commit ed6bcf6 into aplbrain:master Jun 18, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants