Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple single cell samples #211

Closed
wanghlv opened this issue Jul 10, 2024 · 5 comments
Closed

multiple single cell samples #211

wanghlv opened this issue Jul 10, 2024 · 5 comments
Labels
question Further information is requested

Comments

@wanghlv
Copy link

wanghlv commented Jul 10, 2024

Hi, Thanks for writing such a complete MAN page! I have a quick question, I have a total of 6 samples, and all of them are single cell Nanopore libraries. I'd like both the transcript and gene quantification to be per cell (in the CB tag) and per sample.
Could I use --read_group file_name:tag:CB ? or I should supply the file like --read_group file:READ_TO_BARCODE_Samples.TSV:0:2

READ_TO_BARCODE_Samples.TSV, should look like:, so the first column is the READ ID, second is the cell barcodes, and the third is the sample? However, I'm not sure if the read ID is unique across all 6 samples I have.

12a5c9c3-2b73-49c0-a3fd-22d2c10832e2_0 AATCAGGAGTGAACGA Sample1
b6e8c102-e1e2-4155-bc28-7dbb5a34c857_0 CCAGCTGCATGAGCAG Sample2
...

I'm currently running it as the following:
isoquant.py -d ont -r ${FA} --complete_genedb --genedb ${GTF}
--bam ${s1bam} ${s2bam} ${s3bam} ${s4bam} ${s5bam} ${s6bam}
-o IQ_all --prefix IQ_all -l s1 s2 s3 s4 s5 s6
--sqanti_output --check_canonical --count_exons --bam_tags
-t 24 --genedb_output
--model_construction_strategy default_ont
--report_canonical auto --read_group tag:CB

Or I was thinking to add a new tag into my bam file including both the cellbarcode pending with a sample ID like AATCAGGAGTGAACGAs1, CCAGCTGCATGAGCAGs2, ... However, I haven't found a good way to do that because I have a lot of reads in my entire experiment. Thank you so much for your suggestions

Best, Hsiao-Lin

@andrewprzh
Copy link
Collaborator

Dear @wanghlv

Thanks for the feedback!

Could I use --read_group file_name:tag:CB ? or I should supply the file like --read_group file:READ_TO_BARCODE_Samples.TSV:0:2

I think both ways are identical in terms of results, although using read tags may save memory since in this case IsoQuant won't load the entire barcode table into memory.

Unfortunately, current version of IsoQuant can only group counts by one factor at a time, so either the barcode, or the sample. So if you want both, I guess you'll need to perform two runs.

However, I'm not sure if the read ID is unique across all 6 samples I have.

I highly doubt ONT reads can have identical IDs.

Or I was thinking to add a new tag into my bam file including both the cellbarcode pending with a sample ID like AATCAGGAGTGAACGAs1, CCAGCTGCATGAGCAGs2, ... However, I haven't found a good way to do that because I have a lot of reads in my entire experiment. Thank you so much for your suggestions

Adding new tag would require creating a new BAM file, so probably it's easier to create a new TSV table.

P.S. New version 3.4.2 should be more effective in term of RAM consumption, so it's better to update if possible.

Best
Andrey

@andrewprzh andrewprzh added the question Further information is requested label Jul 14, 2024
@wanghlv
Copy link
Author

wanghlv commented Jul 15, 2024

Thank you for all the info and suggestions, and yes 3.4.2 is so much better at using RAM!! I'm wondering if you would recommend a efficient cell barcodes and UMI processing tools before using IsoQuant for mapping, for single cell nanopore data. Also, I'm wondering since I have the single cell data with also UMI. How would you factor in the quantifications, properly to avoid double counting PCR duplicates?
Thanks so much again
Hsiao-Lin

@andrewprzh
Copy link
Collaborator

@wanghlv

Currently, I'm using a barcode calling and PCR de-duplication tools of my own (https://github.com/ablab/IsoQuant/tree/sc_v3). They are not released yet, but at some point they will become a part of IsoQuant too. If you eager to test it, contact me via email, please :)

There are also some pipelines available, such as
https://github.com/nf-core/scnanoseq (also uses IsoQuant)
https://github.com/epi2me-labs/wf-single-cell
They also have a list of tools they use for barcode calling / PCR de-duplication. However, I have not tried any of those yet.

Hope that helps.

Best
Andrey

@vasikara17
Copy link

Hello, I have a similar issue that I posted yesterday! In my case I have one bam file that contains all the conditions. Could you elaborate on running two times isoquant with different tags? How can I keep the barcode and the condition information?
Best,
VK

@andrewprzh
Copy link
Collaborator

Replied in #234

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants