Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have collection with 30000 documents. Dropping all documents (from mongo collection) and again putting the documents (in the same mongo collection) too is taking more time. #433

Open
kawaljeet opened this issue Apr 13, 2016 · 14 comments

Comments

@kawaljeet
Copy link

kawaljeet commented Apr 13, 2016

Hi,
I have some 30,000 documents in the collection. It is taking more than 30 minutes to get that data index. (First time it is pretty quick [less than a minute]. but subsequently if i delete all the documents in the collection and put the set of new documents (same 30000) in the collection, it takes more time.
find below few of the items in the settings file

"batchSize": 500,
"verbosity": 3,
"continueOnError": true,

...
...

"docManagers": [
    {
        "docManager": "elastic_doc_manager",
        "targetURL": "XX.XX.XX.XX:9200",
        "bulkSize": 500,
        "uniqueKey": "_id",
        "autoCommitInterval": 0
    }

...
...
Any suggestion.

Also, if i drop all the documents in the same collection, It is taking almost double the time (almost an hour).

In log I can see 30,000 PUT for insert and 30,000 DELETE for delete.
Seems, it is not doing parallel execution OR bulk operation.

Any suggestion, how to improve the performance ?

@aherlihy
Copy link
Contributor

What version of MongoDB, mongo-connector, and elastic are you using?

@kawaljeet kawaljeet changed the title First time indexing is taking long time to index. Dropping all documents (from mongo collection) and again putting the documents (in the same mongo collection) too is taking more time. I have collection with 30000 documents. Dropping all documents (from mongo collection) and again putting the documents (in the same mongo collection) too is taking more time. Apr 26, 2016
@kawaljeet
Copy link
Author

Mongodb is 2.6 and mongo connector, i again reinstalled last week (to get the latest version) through pip install mongo-connector
i assume it is 2.3. any idea how to get the exact version?
I have also edited the question, it is not happening the first time, it is the second time onwards the issue occurs, when i delete all the documents in the collection and re-populate it.

@aherlihy
Copy link
Contributor

Hi,

If you installed through pip you would have gotten the most recent version, which is 2.3. If you want to be certain you can always run "pip freeze" which will list all the packages you have installed and their versions. I will look into this and hopefully have more information for you soon.

@aherlihy
Copy link
Contributor

Also, what version of elastic are you using?

@kawaljeet
Copy link
Author

@aherlihy .. i am using 2.2 Elasticsearch version
Thanks a lot for your help, much appreciated. :)

Output of Pip freeze command is below
Warning: cannot find svn location for distribute==0.6.24dev-r0
Cheetah==2.4.4
GnuPGInterface==0.3.2
M2Crypto==0.21.1
PyYAML==3.10
apt-xapian-index==0.44
argparse==1.2.1
boto==2.2.2
chardet==2.0.1
cloud-init==0.6.3
command-not-found==0.2.44
configobj==4.7.2

FIXME: could not find svn URL in dependency_links for this package:

distribute==0.6.24dev-r0
elastic2-doc-manager==0.1.0
elasticsearch==2.3.0
euca2ools==2.0.0
httplib2==0.7.2
language-selector==0.1
mongo-connector==2.3
oauth==1.0.1
paramiko==1.7.7.1
pycrypto==2.4.1
pycurl==7.19.0
pymongo==3.2.2
pysolr==3.4.0
python-apt==0.8.3ubuntu7.1
python-debian==0.1.21ubuntu1
requests==2.9.1
ufw==0.31.1-1
unattended-upgrades==0.1
urllib3==1.14
wsgiref==0.1.2

@kawaljeet
Copy link
Author

@aherlihy .. i have provided the input above

@wx7614140
Copy link

If you want to reindex the data,you can delete the oplog.timestamp in /var/log/mongo-connector。

@aherlihy
Copy link
Contributor

aherlihy commented May 9, 2016

Hi @kawaljeet, I'm sorry for the delay in getting back to you. How are you inserting the documents into Mongo-Connector?

My theory on why it’s taking so much longer to insert documents after you delete them is that when you first start up mongo-connector, it uses bulk_upsert during collection dump. After you delete your documents and reinsert them, the elastic inserts are happening with regular upsert because mongo-connector is reading the oplog. If this is what’s happening then there isn't much to be done, but @wx7614140 is correct that if you remove the oplog.timestamp file and you have documents in your MongoDB instance then it will initiate a collection dump like it did the first time.

@kawaljeet
Copy link
Author

Hi @aherlihy .. apologies for the late response. I got the behavior of mongo-connector now. Yes, we are currently, deleting the entire mongo-collection and creating it again. That is why mongo-connector is taking as an upsert (and hence taking time). Actually existing mongo-river was doing it fast (probably they use bulk upsert?). We might need to revisit our design. Thanks a lot for the help

@weixili
Copy link
Contributor

weixili commented Aug 11, 2016

Hi, we also face the same performance issue. We tested with a rate of approx. 200 docs per sec inserting to MongoDB, but the mongo-connector seems like only could handle no more than 30 docs per sec. That is a problem since we even have higher rate in our production environment.

Is that possible for mongo-connector do the same bulk_upsert for the normal operation not only when the first time dump? Mongo-river seems using the bulk_upsert for the normal operation as well.

Thanks!

@llvtt
Copy link

llvtt commented Aug 11, 2016

Using Elastic's bulk API for all upsert operations is in progress here: #446. We should hopefully see a performance boost when we merge the pull request made from this work. @hungvotrung helpfully created a chart showing their own measurements with the patch in progress here: #446 (comment), though this chart may be out of date by now.

@weixili
Copy link
Contributor

weixili commented Aug 12, 2016

Thanks @llvtt ! That is really good news!

@42matters
Copy link

Any progress on this, same issue here?

@llvtt
Copy link

llvtt commented Sep 14, 2016

@42matters you should watch #446; that's where the action is happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants