Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFF3 cannot be recognized #240

Open
sanyalab opened this issue Sep 20, 2024 · 4 comments
Open

GFF3 cannot be recognized #240

sanyalab opened this issue Sep 20, 2024 · 4 comments
Labels
bug Something isn't working enhancement New feature or request fixed in release Issue resolved and the fix is released, waiting for approval input data Issue is caused by input data

Comments

@sanyalab
Copy link

sanyalab commented Sep 20, 2024

Hi,

The tool says that it can work with GFF3. But it only works with GTF. Can we get GFF3 support?

image

Error I get when I provide GFF3 formatted file with the --genedb option

2024-09-19 11:35:13,297 - ERROR - Input GTF seems to be corrupted (see warnings above).
2024-09-19 11:35:13,297 - ERROR - An attempt to correct this GTF was made, the result is written to dummy.corrected.gff3
2024-09-19 11:35:13,297 - ERROR - NB! some transcript / gene ids in the corrected annotation are modified.
2024-09-19 11:35:13,297 - ERROR - Provide a correct GTF by fixing the original input GTF or checking the corrected one.

Do you consume the gene annotations in GTF format or Bed12 format? Is it ok to provide a bed12 file directly?

Thanks
Abhijit

@andrewprzh
Copy link
Collaborator

Dear @sanyalab

IsoQuant does support both GTF and GFF, but not BED. Could you send me the entire isoquant.log file?
Also, you can try running IsoQuant with --no_gtf_check.

Best
Andrey

@sanyalab
Copy link
Author

Hi Andrey,

I actually went ahead and converted the GFF3 to a geneDB format using gffutils. This would be a preprocessing step. It seems to be running fine now. The isoquant.log file is 152MB in size and I cannot upload the same. But here are the first 10 lines and the last 10
FIRST:

Command line: isoquant.py --reference genome.fa --genedb Annotation.gff3 --fastq Sample1.flnc.fastq Sample2.flnc.fastq Sample3.flnc.fastq Sample4.flnc.fastq --output FL_ALL --prefix OUT --data_type pacbio_ccs --fl_data --threads 24 --check_canonical --sqanti_output --matching_strategy precise --splice_correction_strategy default_pacbio --model_construction_strategy fl_pacbio
2024-09-19 11:34:28,180 - INFO - Running IsoQuant version 3.5.0
2024-09-19 11:34:28,222 - INFO -  === IsoQuant pipeline started ===
2024-09-19 11:34:28,222 - INFO - gffutils version: 0.13
2024-09-19 11:34:28,223 - INFO - pysam version: 0.22.1
2024-09-19 11:34:28,223 - INFO - pyfaidx version: 0.8.1.1
2024-09-19 11:34:28,228 - INFO - Checking input gene annotation
2024-09-19 11:34:29,316 - WARNING - Malformed GTF line 2 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,316 - WARNING - Chr00	GSAP	gene	151	2235	.	+	.	ID=dummy1;Name=dummy1
2024-09-19 11:34:29,316 - WARNING - Malformed GTF line 3 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00	GSAP	mRNA	151	2235	.	+	ID=dummy1.1;Parent=dummy1;Name=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 4 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00	GSAP	exon	151	2235	.	+	.	ID=dummy1.1.exon1;Parent=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 5 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00	GSAP	CDS	151	2235	.	+	0	ID=dummy1.1.cds1;Parent=dummy1.1
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 6 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00	GSAP	gene	2412	4316	.	+	.	ID=dummy2;Name=dummy2
2024-09-19 11:34:29,317 - WARNING - Malformed GTF line 7 (gene_id attribute value cannot be found)
2024-09-19 11:34:29,317 - WARNING - Chr00	GSAP	mRNA	2412	4316	.	+	.	ID=dummy2.1;Parent=dummy2;Name=dummy2.1

LAST:

2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638230 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26	GSAP	exon	1450283	1450513	.	+	.	ID=dummy6432.1.exon1;Parent=dummy6432.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638231 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26	GSAP	CDS	1450283	1450513	.	+	0	ID=dummy6432.1.cds1;Parent=dummy6432.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638232 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26	GSAP	gene	1465536	1465607	.	-	.	ID=dummy6433;Name=dummy6433
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638233 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26	GSAP	mRNA	1465536	1465607	.	-	.	ID=dummy6433.1;Parent=dummy6433;Name=dummy6433.1
2024-09-19 11:35:13,258 - WARNING - Malformed GTF line 638234 (gene_id attribute value cannot be found)
2024-09-19 11:35:13,258 - WARNING - Chr26	GSAP	exon	1465536	1465607	.	-	.	ID=dummy6433.1.exon1;Parent=dummy6433.1
2024-09-19 11:35:13,297 - ERROR - Input GTF seems to be corrupted (see warnings above).
2024-09-19 11:35:13,297 - ERROR - An attempt to correct this GTF was made, the result is written to /Path/FL_ALL/Annotation.corrected.gff3
2024-09-19 11:35:13,297 - ERROR - NB! some transcript / gene ids in the corrected annotation are modified.
2024-09-19 11:35:13,297 - ERROR - Provide a correct GTF by fixing the original input GTF or checking the corrected one.

Its not recognizing the GFF3 file

@andrewprzh
Copy link
Collaborator

@sanyalab

Thanks a lot! I will add GFF3 support to the internal checker.
So if gffutils converted it, you can run IsoQuant with --no_gtf_check as well.

@andrewprzh andrewprzh reopened this Sep 20, 2024
@andrewprzh andrewprzh added bug Something isn't working enhancement New feature or request input data Issue is caused by input data labels Sep 20, 2024
@andrewprzh
Copy link
Collaborator

andrewprzh commented Sep 25, 2024

GFF3 should work in IsoQuant 3.6.1 without warnings.

@andrewprzh andrewprzh added the fixed in release Issue resolved and the fix is released, waiting for approval label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request fixed in release Issue resolved and the fix is released, waiting for approval input data Issue is caused by input data
Projects
None yet
Development

No branches or pull requests

2 participants