Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor two header determination #46

Closed
wants to merge 4 commits into from

Conversation

rhpvorderman
Copy link
Collaborator

@rhpvorderman rhpvorderman commented Feb 15, 2022

FastqIter now returns a boolean on first iteration and a sequence record afterwards. This violates the principle of least surprise.
Two headers is now determined by peeking the file. I did not implement a full-fledged FASTQ parsing algorithm there as any errors in the FASTQ are likely revealed by FastqIter with an appropriate error.

Because of this change FastqReader._iter can now be properly typed and FastqIter can be safely and intuitively be used by other projects.

EDIT: I also found a small mistake in _core.pyi that I have corrected here, but I could do that separately if you do not wish to merge this.

fixes #38

@rhpvorderman
Copy link
Collaborator Author

Hmm. This will not be able to correctly determine two headers for long reads on a non-seekable stream. (i.e. when using xopen's pipe methods). The buffer on io.BufferedReader is simply too small.

I see no easy way to work around this, other than keeping the already existing code.

@rhpvorderman rhpvorderman deleted the twoheader branch March 9, 2022 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Two_headers should be a SequenceRecord attribute
1 participant