Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I specify FLNC reads in IsoQuant #226

Open
sanyalab opened this issue Aug 19, 2024 · 4 comments
Open

How do I specify FLNC reads in IsoQuant #226

sanyalab opened this issue Aug 19, 2024 · 4 comments
Labels
question Further information is requested

Comments

@sanyalab
Copy link

Hi,

I have Pacbio FLNC reads in fastq format. What options should be specified while running the tool. I was thinking
--data_type pacbio --fl_data. Is this correct?

Thanks
Abhijit

@andrewprzh andrewprzh added the question Further information is requested label Aug 19, 2024
@andrewprzh
Copy link
Collaborator

Dear @sanyalab

Yes, this set of options is correct.

Best
Andrey

@sanyalab
Copy link
Author

Hi Andrey,

A few other questions?

  1. Are you guys sure there is no difference between pacbio and pacbio_ccs? I used 1.2 million IsoSeq FLNC reads and got 7780 transcripts for pacbio_ccs and 5438 for pacbio. I am using the 3.5 version.
  2. What is the difference among default_pacbio, sensitive_pacbio, and fl_pacbio other than the transcript number.
  3. I am working with a fungal genome (<100MB) in a contig state, that has 2 haplotypes.
    2a. Do I concatenate the haplotype genomes and use them together for IsoQuant or use these separately as I have done above.
    2b. Does this decrease (1.2 mil to ~8000) seem reasonable for a fungal genome? Any suggestions on the optimal number of reads (genome agnostic) for IsoQuant?

Thanks
Abhijit

@andrewprzh
Copy link
Collaborator

@sanyalab

Are you guys sure there is no difference between pacbio and pacbio_ccs? I used 1.2 million IsoSeq FLNC reads and got 7780 transcripts for pacbio_ccs and 5438 for pacbio. I am using the 3.5 version.

Yes, they are just aliases. Could you send me the logs for these runs?

What is the difference among default_pacbio, sensitive_pacbio, and fl_pacbio other than the transcript number.

These are just different option presets. sensitive_pacbio applies slightly lighter filters compared to default_pacbio. fl_pacbio requires known transcripts to be covered by FSM reads to be reported. From the user perspective the only difference is the number of reported transcripts.

I am working with a fungal genome (<100MB) in a contig state, that has 2 haplotypes.
2a. Do I concatenate the haplotype genomes and use them together for IsoQuant or use these separately as I have done above.

I have very little experience with diploid genomes, especially highly diploid. I would first try to create a consensus genome, if even possible. If not, using them separately could be better, since there can be way too much multimappers when using concatenated genome.

2b. Does this decrease (1.2 mil to ~8000) seem reasonable for a fungal genome? Any suggestions on the optimal number of reads (genome agnostic) for IsoQuant?

It's very hard to predict now many novel transcripts should be detected and what is a reasonable number. It depends on how well the genome itself, how well it is sequenced, how deep is your sequencing etc. So, the only suggestion I can give is to check relative genomes or try different settings / tools and compare the output.

Best
Andrey

@sanyalab
Copy link
Author

Hi Andrey,

I'll generate the files again. since it was a test and I was playing with the hyperparameters, I did'nt know what to retain. No worries, I'll generate the files and send you the logs. Thank you for the insightful comments.

-Abhijit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants