Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gene_id attribute value cannot be found #190

Open
sqwwww opened this issue May 17, 2024 · 3 comments
Open

gene_id attribute value cannot be found #190

sqwwww opened this issue May 17, 2024 · 3 comments
Labels
enhancement New feature or request fixed in release Issue resolved and the fix is released, waiting for approval input data Issue is caused by input data

Comments

@sqwwww
Copy link

sqwwww commented May 17, 2024

Hi, I got my GFF file from Funannotate, and it looks like the one below. When I ran IsoQuant, it reported an error saying 'gene_id attribute value cannot be found.' Could you please give me some advice on what might be wrong with my GFF file?

several rows in my gff:

2       funannotate     gene    2121    14252   .       -       .       ID=Hh_000001;Name=IGFN1_1;
2 funannotate transcript 2121 14252 . - . ID=Hh_000001-T1;Parent=Hh_000001;product=Immunoglobulin-like and fibronectin type III;Ontology_term=GO:0005515;Dbxref=PFAM:PF07679,PFAM:PF13927,PFAM:PF00041,InterPro:IPR036179,InterPro:IPR036116;note=COG:T,EggNog:ENOG503BKR4;
2       funannotate     exon    14196   14252   .       -       .       ID=Hh_000001-T1.exon1;Parent=Hh_000001-T1;
2       funannotate     exon    13986   14073   .       -       .       ID=Hh_000001-T1.exon2;Parent=Hh_000001-T1;
2       funannotate     exon    12879   13151   .       -       .       ID=Hh_000001-T1.exon3;Parent=Hh_000001-T1;
2       funannotate     exon    12487   12795   .       -       .       ID=Hh_000001-T1.exon4;Parent=Hh_000001-T1;
2       funannotate     exon    11838   11924   .       -       .       ID=Hh_000001-T1.exon5;Parent=Hh_000001-T1;
2       funannotate     exon    10639   10763   .       -       .       ID=Hh_000001-T1.exon6;Parent=Hh_000001-T1;
2       funannotate     exon    8878    9019    .       -       .       ID=Hh_000001-T1.exon7;Parent=Hh_000001-T1;
2       funannotate     exon    8498    8794    .       -       .       ID=Hh_000001-T1.exon8;Parent=Hh_000001-T1;
2       funannotate     exon    8096    8392    .       -       .       ID=Hh_000001-T1.exon9;Parent=Hh_000001-T1;
2       funannotate     exon    5990    6169    .       -       .       ID=Hh_000001-T1.exon10;Parent=Hh_000001-T1;
2       funannotate     exon    5789    5917    .       -       .       ID=Hh_000001-T1.exon11;Parent=Hh_000001-T1;
2       funannotate     exon    5390    5689    .       -       .       ID=Hh_000001-T1.exon12;Parent=Hh_000001-T1;
2       funannotate     exon    4563    4648    .       -       .       ID=Hh_000001-T1.exon13;Parent=Hh_000001-T1;
2       funannotate     exon    4268    4460    .       -       .       ID=Hh_000001-T1.exon14;Parent=Hh_000001-T1;
2       funannotate     exon    3899    4195    .       -       .       ID=Hh_000001-T1.exon15;Parent=Hh_000001-T1;
2       funannotate     exon    2530    2856    .       -       .       ID=Hh_000001-T1.exon16;Parent=Hh_000001-T1;
2       funannotate     exon    2121    2125    .       -       .       ID=Hh_000001-T1.exon17;Parent=Hh_000001-T1;
2       funannotate     CDS     14196   14252   .       -       0       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     13986   14073   .       -       0       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     12879   13151   .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     12487   12795   .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     11838   11924   .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     10639   10763   .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     8878    9019    .       -       0       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     8498    8794    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     8096    8392    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     5990    6169    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     5789    5917    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     5390    5689    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     4563    4648    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     4268    4460    .       -       0       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     3899    4195    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     2530    2856    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;
2       funannotate     CDS     2121    2125    .       -       2       ID=Hh_000001-T1.cds;Parent=Hh_000001-T1;

several lines from isoquant output:

2024-05-17 21:39:25,250 - INFO - Running IsoQuant version 3.4.1
2024-05-17 21:39:25,252 - INFO - Loading parameters of the previous run, all arguments will be ignored
2024-05-17 21:39:25,329 - INFO - Novel unspliced transcripts will not be reported, set --report_novel_unspliced true to discover them
2024-05-17 21:39:25,329 - INFO -  === IsoQuant pipeline started === 
2024-05-17 21:39:25,329 - INFO - gffutils version: 0.13
2024-05-17 21:39:25,329 - INFO - pysam version: 0.22.0
2024-05-17 21:39:25,329 - INFO - pyfaidx version: 0.8.1.1
2024-05-17 21:39:25,331 - INFO - Checking input gene annotation
2024-05-17 21:39:31,258 - WARNING - Malformed GTF line 1 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,258 - WARNING - 2   funannotate     gene    2121    14252   .       -       .       ID=Hh_000001;Name=IGFN1_1;
2024-05-17 21:39:31,259 - WARNING - Malformed GTF line 3 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,259 - WARNING - 2   funannotate     exon    14196   14252   .       -       .       ID=Hh_000001-T1.exon1;Parent=Hh_000001-T1;
2024-05-17 21:39:31,259 - WARNING - Malformed GTF line 4 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,259 - WARNING - 2   funannotate     exon    13986   14073   .       -       .       ID=Hh_000001-T1.exon2;Parent=Hh_000001-T1;
2024-05-17 21:39:31,259 - WARNING - Malformed GTF line 5 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,259 - WARNING - 2   funannotate     exon    12879   13151   .       -       .       ID=Hh_000001-T1.exon3;Parent=Hh_000001-T1;
2024-05-17 21:39:31,259 - WARNING - Malformed GTF line 6 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,259 - WARNING - 2   funannotate     exon    12487   12795   .       -       .       ID=Hh_000001-T1.exon4;Parent=Hh_000001-T1;
2024-05-17 21:39:31,259 - WARNING - Malformed GTF line 7 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,259 - WARNING - 2   funannotate     exon    11838   11924   .       -       .       ID=Hh_000001-T1.exon5;Parent=Hh_000001-T1;
2024-05-17 21:39:31,259 - WARNING - Malformed GTF line 8 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,259 - WARNING - 2   funannotate     exon    10639   10763   .       -       .       ID=Hh_000001-T1.exon6;Parent=Hh_000001-T1;
2024-05-17 21:39:31,259 - WARNING - Malformed GTF line 9 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,259 - WARNING - 2   funannotate     exon    8878    9019    .       -       .       ID=Hh_000001-T1.exon7;Parent=Hh_000001-T1;
2024-05-17 21:39:31,259 - WARNING - Malformed GTF line 10 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,259 - WARNING - 2   funannotate     exon    8498    8794    .       -       .       ID=Hh_000001-T1.exon8;Parent=Hh_000001-T1;
2024-05-17 21:39:31,260 - WARNING - Malformed GTF line 11 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,260 - WARNING - 2   funannotate     exon    8096    8392    .       -       .       ID=Hh_000001-T1.exon9;Parent=Hh_000001-T1;
2024-05-17 21:39:31,260 - WARNING - Malformed GTF line 12 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,260 - WARNING - 2   funannotate     exon    5990    6169    .       -       .       ID=Hh_000001-T1.exon10;Parent=Hh_000001-T1;
2024-05-17 21:39:31,260 - WARNING - Malformed GTF line 13 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,260 - WARNING - 2   funannotate     exon    5789    5917    .       -       .       ID=Hh_000001-T1.exon11;Parent=Hh_000001-T1;
2024-05-17 21:39:31,260 - WARNING - Malformed GTF line 14 (gene_id attribute value cannot be found)
2024-05-17 21:39:31,260 - WARNING - 2   funannotate     exon    5390    5689    .       -       .       ID=Hh_000001-T1.exon12;Parent=Hh_000001-T1;
@sqwwww
Copy link
Author

sqwwww commented May 17, 2024

I convert the original gff to gtf by agat tools, and the converted gtf works.

@andrewprzh andrewprzh added enhancement New feature or request input data Issue is caused by input data labels May 20, 2024
@andrewprzh
Copy link
Collaborator

Dear @sqwwww

Thanks for the report!

This looks like a valid GTF. However, exact attributes indicating ids and feature relationships are not fixed in the specification.
Current IsoQuant GTF checks expect each feature to have gene_id and transcript_id (expect genes of course). This annotation uses ID and Parent. You can omit these checks using --no_gtf_checks flag.
I will update GTF checks at some point to handle this format as well.

However, this particular annotation will not be processed by gffutils since both the gene and the transcript have identical IDs.

Best
Andrey

@andrewprzh
Copy link
Collaborator

Should be fixed in IsoQuant 3.6.1, which implements checks for GFF3 format.

@andrewprzh andrewprzh added the fixed in release Issue resolved and the fix is released, waiting for approval label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request fixed in release Issue resolved and the fix is released, waiting for approval input data Issue is caused by input data
Projects
None yet
Development

No branches or pull requests

2 participants