-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have collection with 30000 documents. Dropping all documents (from mongo collection) and again putting the documents (in the same mongo collection) too is taking more time. #433
Comments
What version of MongoDB, mongo-connector, and elastic are you using? |
Mongodb is 2.6 and mongo connector, i again reinstalled last week (to get the latest version) through pip install mongo-connector |
Hi, If you installed through pip you would have gotten the most recent version, which is 2.3. If you want to be certain you can always run "pip freeze" which will list all the packages you have installed and their versions. I will look into this and hopefully have more information for you soon. |
Also, what version of elastic are you using? |
@aherlihy .. i am using 2.2 Elasticsearch version Output of Pip freeze command is below FIXME: could not find svn URL in dependency_links for this package:distribute==0.6.24dev-r0 |
@aherlihy .. i have provided the input above |
If you want to reindex the data,you can delete the oplog.timestamp in /var/log/mongo-connector。 |
Hi @kawaljeet, I'm sorry for the delay in getting back to you. How are you inserting the documents into Mongo-Connector? My theory on why it’s taking so much longer to insert documents after you delete them is that when you first start up mongo-connector, it uses bulk_upsert during collection dump. After you delete your documents and reinsert them, the elastic inserts are happening with regular upsert because mongo-connector is reading the oplog. If this is what’s happening then there isn't much to be done, but @wx7614140 is correct that if you remove the oplog.timestamp file and you have documents in your MongoDB instance then it will initiate a collection dump like it did the first time. |
Hi @aherlihy .. apologies for the late response. I got the behavior of mongo-connector now. Yes, we are currently, deleting the entire mongo-collection and creating it again. That is why mongo-connector is taking as an upsert (and hence taking time). Actually existing mongo-river was doing it fast (probably they use bulk upsert?). We might need to revisit our design. Thanks a lot for the help |
Hi, we also face the same performance issue. We tested with a rate of approx. 200 docs per sec inserting to MongoDB, but the mongo-connector seems like only could handle no more than 30 docs per sec. That is a problem since we even have higher rate in our production environment. Is that possible for mongo-connector do the same bulk_upsert for the normal operation not only when the first time dump? Mongo-river seems using the bulk_upsert for the normal operation as well. Thanks! |
Using Elastic's bulk API for all upsert operations is in progress here: #446. We should hopefully see a performance boost when we merge the pull request made from this work. @hungvotrung helpfully created a chart showing their own measurements with the patch in progress here: #446 (comment), though this chart may be out of date by now. |
Thanks @llvtt ! That is really good news! |
Any progress on this, same issue here? |
@42matters you should watch #446; that's where the action is happening. |
Hi,
I have some 30,000 documents in the collection. It is taking more than 30 minutes to get that data index. (First time it is pretty quick [less than a minute]. but subsequently if i delete all the documents in the collection and put the set of new documents (same 30000) in the collection, it takes more time.
find below few of the items in the settings file
...
...
...
...
Any suggestion.
Also, if i drop all the documents in the same collection, It is taking almost double the time (almost an hour).
In log I can see 30,000 PUT for insert and 30,000 DELETE for delete.
Seems, it is not doing parallel execution OR bulk operation.
Any suggestion, how to improve the performance ?
The text was updated successfully, but these errors were encountered: