Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could an optional workflow/script be provided to downsample datasets to 1%? #85

Open
kternus opened this issue Jun 9, 2018 · 2 comments

Comments

@kternus
Copy link
Collaborator

kternus commented Jun 9, 2018

This would be used in cases where non-complex metagenomes do not require a high amount of sequencing coverage to capture all of the necessary information.

It would be an option for dahak users to implement after read filtering and before any downstream analysis takes place.

@ctb
Copy link
Contributor

ctb commented Jun 10, 2018 via email

@kternus
Copy link
Collaborator Author

kternus commented Jun 11, 2018

Thanks for sharing all of those options! It would be good to talk through these options more on a call.

I believe we would want to run multiple dahak workflows beyond taxonomic classification, which would exclude sourmash watch. Although I didn't realize sourmash watch existed until this moment, and that sounds really interesting. I will file away all of those questions for later too.

sample-reads-randomly.py sounds like it's ready to go and better than taking first 1% of a file.

Solutions to create a small-but-informative subset of the data sound compelling, but I'm not sure how challenging it would be to integrate khmer heavy guns into dahak?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants