-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parent-child relationship #25
Conversation
@benan789 is this how you imagine grandparent relationships could work? |
Yes, that's perfect. |
'Elasticsearch to apply update', u(document_id)) | ||
return None | ||
had_parent = '_parent' in document | ||
old_parent = document.get('_parent') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_parent_id_from_elastic
? You could also do if old_parent:
below instead of creating had_parent
.
# index the new child. | ||
updated = self.apply_update(document['_source'], update_spec) | ||
if (had_parent and updated.get(parent_field) != old_parent) or ( | ||
not had_parent and parent_field in updated): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it sufficient to check if updated.get(parent_field) != old_parent
?
refresh=(self.auto_commit_interval == 0)) | ||
parent_args = {} | ||
if parent_id is not None: | ||
parent_args['parent'] = parent_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a common pattern:
parent_id = self._get_parent_id_from_mongodb(...)
parent_args = {}
if parent_id is not None:
parent_args['parent'] = parent_id
perhaps this could be its own helper function? like:
parent_args = self._get_parent_args(index, doc_type, doc)
# This is due to the fact that Elasticsearch needs the parent ID | ||
# to know where to route the get request. We do not have the | ||
# parent ID available in our update request though. | ||
document = self._search_doc_by_id(index, doc_type, document_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is something the BulkBuffer can help us with, once that gets merged in.
Is there an ETA on this? Thanks! |
Will this feature be merged? Thanks! |
@ShaneHarvey - Sorry for the delay! We were able to test this and it worked. |
@ShaneHarvey - I've been testing this out for the use case my coworker @lnader asked about. It's working great! 👍 |
so, can I use this? and is this safe to use? |
@ShaneHarvey - When will this get merged? Thanks |
Hello, any news on when we got this merged and we can use it ? Thanks! |
Hello, any news on when we got this merged ? Thanks! :) |
When is this going to get merged? |
I'd like to use this too. Any update on when it will be merged? |
if parent_field is None: | ||
return None | ||
|
||
return self._formatter.transform_value(doc.pop(parent_field, None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually don't think the parent field should be popped off the doc. I use the parent field in queries as a filter. I can't find anything that speaks to the performance of has_parent, but my gut says that a bool.filter.term.parent_field="value" is more performant than has_parent.
Having this functionality would be super-helpful. @ShaneHarvey: any chance you could update this PR so we could look at merging it again? |
Hello, |
I tested this case under es5.5.2, not passed. :-( 2017-09-19 17:06:42,881 [CRITICAL] mongo_connector.oplog_manager:670 - Exception during collection dump |
Please add me to the list of folk asking when this will be merged. I need to link two huge collections and application side joins and nesting are out of the question. |
hello~when will it be merged? |
@eric-chao I have a similar problem. How did you solve it? Or is this branch not integrated into the main molecule, which is unable to use this function? |
Sadly, this PR has some non-trivial merge conflicts with the implementation. @ShaneHarvey if you're willing to resolve the merge conflicts and if you don't think the implementation is stable enough for a release, I'd like to get it out, as it sounds like there's some good demand for it. |
I'm cleaning up some old issues highlighted by the new github dashboard UI and realized this is still open. Sorry for the huge delay here @jaraco (and everyone else waiting for this feature). The problem with this PR is that it conflicts with the bulk/buffering work added in #15. Resolving the conflicts are definitely non-trivial. In order for this to be merged I think the buffering logic might need to be simplified to make parent-child lookup logic simpler. That said, I haven't worked on this project for quite a while and my knowledge of Elastic is also not up to date so there might be a solution that does not involve changing the bulk buffer logic. For now I'm going to close this PR because I'm not planning to work on it. Anyone else is free to take over this work and open a new PR using the same ideas (as I did with #3). |
This builds on @xiaogaozi's work on #3.
Add support for parent-child mapping.
Example Usage
Let's use the data from the Elasticsearch Parent-Child Mapping docs and adapt it to MongoDB. Suppose you have a
company
database in MongoDB with the collectionsbranch
andemployees
. Use this script to load some sample data:Note: this example assumes you have a MongoDB replica set on localhost:27017 and an Elasticsearch instance on localhost:9200.
In the current pull request, the elastic-doc-manager does not automatically create the parent-child mapping. So create the mapping manually before running mongo-connector:
Next, create the mongo-connector config file with parent-child mapping options:
Next, run mongo-connector with the above config file:
mongo-connector -c parent-child-config.json
Finally, you can query Elasticsearch with your MongoDB data.
Query Elasticsearch for children by their parents:
Or, query Elasticsearch for parents by their children:
Up for discussion:
parentType
field to the config file as well as theparentField
."parentField": "foo.bar.parent_id"
will not work.parent
) and the grandparent id (forrouting
). I propose we only support grandparent relationships when the grandchild documents contain aparentField
and an additionalroutingField
. This would avoid having to query the parent collection to find the parent's_routing
to use forrouting
(what if the parent doesn't exist yet which may happen during the collection dump?). This would support great-grandchildren, great-great-grandchildren, etc. automatically. Extending the example above with the Elasticsearch Grandparents example: