Skip to content

Commit

Permalink
Merge pull request #267 from monarch-initiative/improve-allele-predic…
Browse files Browse the repository at this point in the history
…ate-docs

Improve Pydoc for allelic predicates
  • Loading branch information
ielis committed Sep 12, 2024
2 parents b568b96 + 1b3c848 commit c5e6996
Show file tree
Hide file tree
Showing 7 changed files with 265 additions and 142 deletions.
162 changes: 91 additions & 71 deletions docs/user-guide/mtc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -192,76 +192,96 @@ We use static constructor :func:`~gpsea.analysis.mtc_filter.HpoMtcFilter.default
for creating :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`.
The constructor takes a threshold as an argument (e.g. 20% in the example above)
and the method's logic is made up of 8 individual heuristics
designed to skip testing the HPO terms that are unlikely to yield significant or interesting results:

+------------+-------------------+--------------------------------------------------------------------------------------------+
| Code | Name | Description |
+------------+-------------------+--------------------------------------------------------------------------------------------+
| `HMF01` | Skip terms that | The ``term_frequency_threshold`` determines the mininum proportion of individuals |
| | occur very rarely | with direct or indirect annotation by the HPO term to test. |
| | | We check each of the genotype groups (e.g., MISSENSE vs. not-MISSENSE), |
| | | and we only retain a term for testing if the proportion of individuals |
| | | in at least one genotype group is greater than |
| | | or equal to ``term_frequency_threshold``. |
| | | This is because of our assumption that even if there is statistical significance, |
| | | if a term is only seen in (for example) 7% of individuals |
| | | in the MISSENSE group and 2% in the not-MISSENSE group, |
| | | the term is unlikely to be of great interest because it is rare. |
+------------+-------------------+--------------------------------------------------------------------------------------------+
| `HMF02` | Skip terms if | In a related heuristic, we skip terms if no genotype group has more |
| | no cell has more | than one count. This is not completely redundant with the previous condition, |
| | than one count | because some terms may have a small number of total observations. |
+------------+-------------------+--------------------------------------------------------------------------------------------+
| `HMF03` | Skip terms if | Let's say a term such as |
| | all counts are | `Posterior polar cataract (HP:0001115) <https://hpo.jax.org/browse/term/HP:0001115>`_ |
| | identical | was observed in 7 of 11 individuals with MISSENSE variants |
| | to counts | and in 3 of 8 individuals with NONSENSE variants. |
| | for a child | If we find the same patient counts (7 of 11 and 3 of 8) in the parent term |
| | term | `Polar cataract HP:0010696 <https://hpo.jax.org/browse/term/HP:0010696>`_, |
| | | then we choose to not test the parent term. |
| | | |
| | | This is because the more specific an HPO term is, |
| | | the more information it has (the more interesting the correlation would be if it exists), |
| | | and the result of a test, such as the Fisher Exact test, would be exactly the same |
| | | for *Polar cataract* as for *Posterior polar cataract*. |
+------------+-------------------+--------------------------------------------------------------------------------------------+
| `HMF04` | Skip terms if | If both (or all) of the genotype groups have the same proportion of individuals |
| | genotypes have | observed to be annotated to an HPO term, e.g., both are 50%, then skip the term, |
| | same HPO | because it is not possible that the Fisher exact test will return a significant result. |
| | proportions | |
+------------+-------------------+--------------------------------------------------------------------------------------------+
| `HMF05` | Skip terms if | If one of the genotype groups has neither observed nor excluded observations |
| | there are no | for an HPO term, skip it. |
| | HPO observations | |
| | in a group | |
+------------+-------------------+--------------------------------------------------------------------------------------------+
| `HMF06` | Skip term if | If the individuals are binned into 2 phenotype groups and 2 genotype groups (2x2) |
| | underpowered | and the total count of patients in all genotype-phenotype groups is less than 7, |
| | for 2x2 or 2x3 | or into 2 phenotype groups and 3 genotype groups (2x3) and the total count of patients |
| | analysis | is less than 6, then there is a lack even of the nominal statistical power |
| | | and the counts can never be significant. |
+------------+-------------------+--------------------------------------------------------------------------------------------+
| `HMF07` | Skipping terms | The HPO has a number of other branches that describe modes of inheritance, |
| | that are not | past medical history, and clinical modifiers. |
| | descendents of | We do not think it makes much sense to test for enrichment of these terms, |
| | *Phenotypic* | so, all terms that are not descendants of |
| | *abnormality* | `Phenotypic abnormality <https://hpo.jax.org/browse/term/HP:0000118>`_ are filtered out. |
| | | |
+------------+-------------------+--------------------------------------------------------------------------------------------+
| `HMF08` | Skipping | All the direct children of the root phenotype term |
| | "general" | `Phenotypic abnormality (HP:0000118) <https://hpo.jax.org/browse/term/HP:0000118>`_ |
| | level terms | are skipped, because of the assumption that if there is a valid signal, |
| | | it will derive from one of the more specific descendents. |
| | | |
| | | For instance, |
| | |`Abnormality of the nervous system <https://hpo.jax.org/browse/term/HP:0000707>`_ |
| | | (HP:0000707) is a child of *Phenotypic abnormality*, and this assumption implies |
| | | that if there is a signal from the nervous system, |
| | | it will lead to at least one of the descendents of |
| | | *Abnormality of the nervous system* being significant. |
| | | |
| | | See :ref:`general-hpo-terms` section for details. |
| | | |
+------------+-------------------+--------------------------------------------------------------------------------------------+
designed to skip testing the HPO terms that are unlikely to yield significant or interesting results.


`HMF01` - Skip terms that occur very rarely
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``term_frequency_threshold`` determines the mininum proportion of individuals
with direct or indirect annotation by the HPO term to test.
We check each of the genotype groups (e.g., MISSENSE vs. not-MISSENSE),
and we only retain a term for testing if the proportion of individuals
in at least one genotype group is greater than
or equal to ``term_frequency_threshold``.
This is because of our assumption that even if there is statistical significance,
if a term is only seen in (for example) 7% of individuals
in the MISSENSE group and 2% in the not-MISSENSE group,
the term is unlikely to be of great interest because it is rare.


`HMF02` - Skip terms if no genotype group has more than one count
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In a related heuristic, we skip terms if no genotype group has more
than one count. This is not completely redundant with the previous condition,
because some terms may have a small number of total observations.


`HMF03` - Skip terms if all counts are identical to counts for a child term
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let's say a term such as
`Posterior polar cataract (HP:0001115) <https://hpo.jax.org/browse/term/HP:0001115>`_
was observed in 7 of 11 individuals with MISSENSE variants
and in 3 of 8 individuals with NONSENSE variants.
If we find the same patient counts (7 of 11 and 3 of 8) in the parent term
`Polar cataract HP:0010696 <https://hpo.jax.org/browse/term/HP:0010696>`_,
then we choose to not test the parent term.

This is because the more specific an HPO term is,
the more information it has (the more interesting the correlation would be if it exists),
and the result of a test, such as the Fisher Exact test, would be exactly the same
for *Polar cataract* as for *Posterior polar cataract*.


`HMF04` - Skip terms if genotypes have same HPO proportions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If both (or all) of the genotype groups have the same proportion of individuals
observed to be annotated to an HPO term, e.g., both are 50%, then skip the term,
because it is not possible that the Fisher exact test will return a significant result.


`HMF05` - Skip term if one of the genotype groups has neither observed nor excluded observations
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Skip terms if there are no HPO observations in a group.


`HMF06` - Skip term if underpowered for 2x2 or 2x3 analysis
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If the individuals are binned into 2 phenotype groups and 2 genotype groups (2x2)
and the total count of patients in all genotype-phenotype groups is less than 7,
or into 2 phenotype groups and 3 genotype groups (2x3) and the total count of patients
is less than 6, then there is a lack even of the nominal statistical power
and the counts can never be significant.


`HMF07` - Skipping terms that are not descendents of *Phenotypic abnormality*
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The HPO has a number of other branches that describe modes of inheritance,
past medical history, and clinical modifiers.
We do not think it makes much sense to test for enrichment of these terms,
so, all terms that are not descendants of
`Phenotypic abnormality <https://hpo.jax.org/browse/term/HP:0000118>`_ are filtered out.


`HMF08` - Skipping "general" level terms
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

All the direct children of the root phenotype term
`Phenotypic abnormality (HP:0000118) <https://hpo.jax.org/browse/term/HP:0000118>`_
are skipped, because of the assumption that if there is a valid signal,
it will derive from one of the more specific descendents.

For instance,
`Abnormality of the nervous system <https://hpo.jax.org/browse/term/HP:0000707>`_
(HP:0000707) is a child of *Phenotypic abnormality*, and this assumption implies
that if there is a signal from the nervous system,
it will lead to at least one of the descendents of
*Abnormality of the nervous system* being significant.

See :ref:`general-hpo-terms` section for details.
26 changes: 13 additions & 13 deletions docs/user-guide/predicates/mode_of_inheritance_predicate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ autosomal recessive, X-linked dominant, X-linked recessive, and mitochondrial
(See `Understanding Genetics, Appendix B <https://www.ncbi.nlm.nih.gov/books/NBK132145/>`_).


The :class:`~gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate`
assigns the individual into a group based on the number of alleles
that match a condition specified by a :class:`~gpsea.analysis.predicate.genotype.VariantPredicate`.
The :class:`~gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate` supports
the following Mendelian modes of inheritance (MoI):
The :class:`~gpsea.analysis.predicate.genotype.autosomal_dominant`
and :class:`~gpsea.analysis.predicate.genotype.autosomal_recessive`
assigns the individual into a group based on the number of the alleles
observed in the individual.
GPSEA supports the following Mendelian modes of inheritance (MoI):


+-----------------------+------------------+------------------------+
Expand All @@ -40,11 +40,11 @@ the following Mendelian modes of inheritance (MoI):
`BIALLELIC_ALT` includes both homozygous and compound heterozygous genotypes.

Clinical judgment should be used to choose the MoI for the cohort analysis.
Then a predicate for the desired MoI can be created by one of
:class:`~gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate` static constructors:
Then a predicate for the desired MoI can be created by calling one
of the following methods:

* :func:`~gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate.autosomal_dominant`
* :func:`~gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate.autosomal_recessive`
* :func:`~gpsea.analysis.predicate.genotype.autosomal_dominant`
* :func:`~gpsea.analysis.predicate.genotype.autosomal_recessive`

By default, the MoI predicates will use *all* variants recorded in the individual.
However, a :class:`~gpsea.analysis.predicate.genotype.VariantPredicate`
Expand All @@ -57,11 +57,11 @@ Assign individuals into genotype groups
Here we show seting up a predicate for grouping individuals for differences
between genotypes of a disease with an autosomal recessive MoI.

We use :class:`~gpsea.analysis.predicate.genotype.ModeOfInheritancePredicate.autosomal_recessive`
We use :class:`~gpsea.analysis.predicate.genotype.autosomal_recessive`
to create the predicate:

>>> from gpsea.analysis.predicate.genotype import ModeOfInheritancePredicate
>>> gt_predicate = ModeOfInheritancePredicate.autosomal_recessive()
>>> from gpsea.analysis.predicate.genotype import autosomal_recessive
>>> gt_predicate = autosomal_recessive()
>>> gt_predicate.display_question()
'What is the genotype group: HOM_REF, HET, BIALLELIC_ALT'

Expand All @@ -88,6 +88,6 @@ when assigning the genotype group. We set up the variant predicate:

and we use it to create the MoI predicate:

>>> gt_predicate = ModeOfInheritancePredicate.autosomal_recessive(is_missense)
>>> gt_predicate = autosomal_recessive(is_missense)
>>> gt_predicate.display_question()
'What is the genotype group: HOM_REF, HET, BIALLELIC_ALT'
4 changes: 2 additions & 2 deletions docs/user-guide/stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@ to test if the variant leads to a frameshift (in this case):
and then we choose the expected mode of inheritance to test. In case of *TBX5*,
we expect the autosomal dominant mode of inheritance:

>>> from gpsea.analysis.predicate.genotype import ModeOfInheritancePredicate
>>> gt_predicate = ModeOfInheritancePredicate.autosomal_dominant(is_frameshift)
>>> from gpsea.analysis.predicate.genotype import autosomal_dominant
>>> gt_predicate = autosomal_dominant(is_frameshift)
>>> gt_predicate.display_question()
'What is the genotype group: HOM_REF, HET'

Expand Down
4 changes: 3 additions & 1 deletion src/gpsea/analysis/predicate/genotype/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
from ._api import VariantPredicate
from ._counter import AlleleCounter
from ._gt_predicates import groups_predicate, sex_predicate, diagnosis_predicate
from ._gt_predicates import autosomal_dominant, autosomal_recessive
from ._gt_predicates import monoallelic_predicate, biallelic_predicate
from ._gt_predicates import ModeOfInheritancePredicate
from ._gt_predicates import ModeOfInheritancePredicate # TODO: remove before 1.0.0
from ._variant import VariantPredicates, ProteinPredicates

__all__ = [
'GenotypePolyPredicate',
'groups_predicate', 'sex_predicate', 'diagnosis_predicate',
'autosomal_dominant', 'autosomal_recessive',
'monoallelic_predicate', 'biallelic_predicate',
'ModeOfInheritancePredicate',
'AlleleCounter', 'VariantPredicate',
Expand Down
Loading

0 comments on commit c5e6996

Please sign in to comment.