Annotating from phenotypes: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
(99 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=Introduction=
= Introduction =
Mutants can provide useful insights into a protein's function. GO annotations based on a phenotype should represent the normal function that can be inferred from the mutant. GO does not aim to capture individual phenotypes; use '''phenotype annotation resources (do we want to be more specific here? Most MODs perform phenotype curation, but what about human phenotypes, for example?)''' for this purpose. The following guidelines should help determine how to annotate the function of a protein that can be inferred from its observed phenotypes.  
Mutant phenotypes can provide important insights into gene function. In order to use mutant phenotypes as supporting evidence for a GO annotation, however, curators should keep in mind the following general principles:
#GO annotations based on a mutant phenotype should represent the ''normal'' role of a gene in biology. 
#Mutant phenotypes need to be interpreted in the overall context of what is known about a gene so that appropriate relations and Biological Process terms can be selected; some phenotypes represent manifestations of genetic perturbations far downstream from the actual function of a gene.
#GO does not aim to capture individual phenotypes. Most groups that contribute annotations to GO also curate phenotypes as a separate data type, allowing GO curators to be selective about phenotypes used for annotation knowing that more complete and detailed phenotype curation is often captured elsewhere.


===What is the normal molecular function/biological process?===
= Relations Glossary =
*Remember that annotations are inferences from the evidence to a normal function/process.  
*These guidelines for annotating from phenotypes refer to several relations from the Relations Ontology (RO). 
*You can only annotate a gene product as being 'involved in' a biological process if the MF can be placed within the set of MFs that make up that process.
*A brief summary of how to use the relations cited in these guidelines, as well as links to more detailed GO annotation guidelines and the respective pages in Ontobee, are below. 
**To help determine whether this is the case, it is useful to create a GO-CAM model in Noctua or consult a pathway or process model from a recent paper or review.
== Gene/Gene Product to GO Term Relations ==
*If there is no MF known, any phenotype can only be annotated to ‘acts upstream of or within’ OR consider not making a GO annotation - it’s OK!
*''involved in''
**'''Q: Is it possible to have a gene product with a novel or unknown MF (or just a protein binding MF) that is otherwise well characterized genetically (and maybe somewhat biochemically?) such that it could be placed within a given pathway or process?'''
**[http://www.ontobee.org/ontology/RO?iri=http://purl.obolibrary.org/obo/RO_0002331 involved in at Ontobee]
*Example: nuclear pore: BRR6 is involved in nuclear envelope organization, when mutated, causes nucleocytoplasmic transport defects, but is NOT involved in nuclear transport
**Formal definition: C involved_in p if and only if c enables some process p', and p' is part of p
**'''Cite paper(s) here?'''
**For any given Biological Process, think of the Molecular Functions that make up that process and using that information, decide if the gene/gene product being annotated enables one of those functions. If so, then the gene/gene product is ''involved in'' that process.
*''acts upstream of or within''
**[http://www.ontobee.org/ontology/RO?iri=http://purl.obolibrary.org/obo/RO_0002264 acts upstream of or within at Ontobee]
**Formal definition: C acts upstream of or within p if c is enables f, and f is causally upstream of or within p. c is a material entity and p is an process.
**The ''acts upstream of or within'' relation is used between a gene/gene product and a Biological Process when it is not known, mechanistically, how the Molecular Function of the gene/gene product affects the process.
*''acts upstream of''
**[http://www.ontobee.org/ontology/RO?iri=http://purl.obolibrary.org/obo/RO_0002263 acts upstream of at Ontobee]
**Formal definition: C acts upstream of p if and only if c enables some f that is involved in p' and p' occurs chronologically before p, is not part of p, and affects the execution of p. c is a material entity and f, p, p' are processes.
**The ''acts upstream of'' relation is used when the function of the gene/gene product is ''involved in'' a process that is ''causally upstream of'' another process.


=== Being ‘required for’ a process does not mean a protein is ‘involved in’ a process ===
*Note: for the ''acts upstream of or within'' and ''acts upstream of'' relations, there are positive and negative child relations that can be used to capture the directionality of the effect, if known.
'''It is common for authors to state that a gene or gene product is ''required for'' a given process.  However, the true meaning of this statement can vary and it is thus the responsibility of the curator to determine whether the gene or gene product's activity is, indeed, an integral part of a process.'''
'''To help make this decision, it may be useful to think about the gene's role in a concentric circle of processes, starting with its MF.  For a given MF, what is the most proximal process in which the gene is involved?  For a transcription factor, the most proximal process would be regulation of transcription.  From there, what is the next most proximal process? In a development context, for example, this might be cell fate specification or differentiation. Moving outward, what phenotypes might be indicative of defects in cell fate specification?  If, for example, the cell was a particular class of neurons, then one of the defects may manifest as a change in a behavior.  Is the transcription factor thus 'involved in' the behavior?  No, but by annotating the TF to regulation of transcription (perhaps with relevant target genes) and specification of the particular cell types, it is possible capture the most relevant aspects of that gene's function.  The defects in behavior could then be captured with phenotype annotations.  Possible example: ''C. elegans ttx-1'' '''
'''Note that placing a gene or gene product in its appropriate biological context may require reading more than one paper!'''


apoE example? https://www.uniprot.org/uniprot/P02649
== GO Term Relations ==


===Pleiotropic effects should not usually be captured===
*''part of''
Pleiotropy and ‘required for’ a process does not mean a protein is ‘part of’ a process
**Formal definition: A core relation that holds between a part and its whole
**A Molecular Function is ''part of'' a Biological Process if execution of that MF is integral to the fulfillment of the BP.


For example: splicing factors are often required for cell cycle transition, but they are not part of the cell cycle transition
*''regulates''
A good clue is viability of mutants: inviable mutants often have pleiotropic phenotypes or they have a strong terminal phenotype that can easily be misinterpreted (cell cycle transition blocks/checkpoints, chromosome mis-segregation, etc)
**Formal definition: Process(P1) regulates process(P2) iff: P1 results in the initiation or termination of P2 OR affects the frequency of its initiation or termination OR affects the magnitude or rate of output of P2.
Beware of read-outs: DNA replication, apoptotic DNA fragmentation, etc)


===Cell proliferation, cell migration and apoptosis===
= Guidelines =
*Mutants showing increased/decreased cell proliferation, cell migration and apoptosis need to be analyzed carefully. If we don’t know the underlying molecular/cellular mechanism, these annotations should not be made
== Ask yourself: what is the normal Biological Process for this gene? ==
*Mutually exclusive terms: cell proliferation should not be used for proteins involved generally in growth or division.
*Remember that annotations based on mutant phenotypes are inferences about the ''normal'' Biological Process (BP) for a gene.
*'''I would also add to this list: lethality, low brood size, slow growth, perhaps also sluggish locomotion.'''
*To decide if a gene is directly ''involved in'' a BP, determine if its Molecular Function (MF) is one of the MFs that makes up that BP.
**To help with this, it may be useful to create an activity-based GO-CAM model in Noctua or consult a pathway or process model from a recent paper or review.
*If the MF of a gene is not known, curators must be very careful about making a BP annotation based solely on a phenotype.
**For genes with unknown MFs, in the absence of additional analysis (e.g. genetic and physical interactions), phenotypes can generally only be annotated to a BP using the ''acts upstream of or within'' relation (or one of its positive or negative effect child relations).  In these cases, it is also perfectly acceptable to '''not''' make a GO annotation until more is known about the gene.


=== Mutant phenotypes and regulation terms ===
== Being ‘required for’ a process does not necessarily mean a gene/gene product is ''involved in'' a process ==
*'''It can be difficult to assess whether a gene or gene product regulates a process based on mutant phenotypes alone.  As annotation to regulation terms in the BP ontology requires an understanding of the molecular basis for that regulation, mutant phenotypes may more often be used as supporting, rather than definitive, evidence for a gene's regulatory role.  As with annotating to BP terms from phenotypes more generally, consider what is known about the MFs involved in the process and use that information to guide your annotation practice.'''
*In GO annotation, making a statement that a gene is ''involved in'' a process is an explicit statement that the MF of that gene is an integral part of the BP.
*Mutants annotated to ‘regulation’ with no molecular function annotation (sometimes an annotation to a protein complex of with a known function) should be examined closely and  reviewed.
*Although authors may state that a gene is 'required for' a process, GO curators must interpret those statements within the defined semantics of a GO annotation and decide if that means the gene's activity is truly a ''part of'' the BP.
*If the MF is not ''part of'' the BP, consider using a less specific relation between the gene and the BP, or not annotating for BP until more information is known.


=== Mutant phenotypes and other BP relations ===
== Pleiotropic effects need to be considered carefully ==
*'''acts upstream of or within'''
*Pleiotropy, in which a single variation can result in multiple defects, needs to be carefully considered in the context of GO BP annotation.
*'''acts upstream of'''
*An understanding of the molecular mechanism underlying pleiotropic defects can help determine if the gene is integrally ''involved in'' each process, or affects a process more generally required.
*'''We have some info on the relations pages, but I think we really need more examples.'''


=== Reviewing and removing older phenotype annotations ===
== Beware of read-outs ==
When there is new knowledge, older IMP annotation should be reviewed and removed as required - '''link to section on removing annotations?'''  
*Biologists often use experimental "read-outs" to assess a gene's role in a larger biological process.
*The affected "read-out", however, is not always indicative of the process in which the gene is most proximally involved.
*For example, DNA fragmentation is often used as a "read-out" for apoptosis, but mutations that result in increased DNA fragmentation do not necessarily indicate that the corresponding gene is directly ''involved'' in DNA catabolism.
*Again, consider the MF of the mutated gene and how it fits into the BP used as a "read-out" as well as the overall BP being studied.


== "High-level" phenotypes ==
*Some papers may describe how genomic perturbations affect relatively 'high-level' processes.  Examples of 'high-level' processes include:
**Aging or determination of adult lifespan
**Behavior
**Cell death
**Cell division
**Cell migration
**Cell proliferation
**Development (embryonic, larval)
**Growth
**Locomotion
**Protein localization
**Reproduction and brood size
*If the mechanism underlying the phenotypic effects on these processes is not known, we strongly encourage curators '''not''' to make annotations to the corresponding BP term.
**Creating a GO-CAM model or consulting a recent paper or review can help determine the basis for the phenotype and whether the BP annotation is appropriate, and if so, with what relation.


== Mutant phenotypes and regulation terms ==
*The GO BP ontology contains terms for processes as well as regulation of processes.
*From mutant phenotypes alone, it can be difficult to assess whether a gene is involved in a process or regulates a process. 
*Mutant phenotypes thus more often provide supporting, rather than definitive, evidence that a gene is ''involved in'' regulation of a process.
*As with annotating to BP terms from phenotypes more generally, consider what is known about the MFs ''involved in'' the process and use that information to decide if annotating to the process or regulation of the process is best.
*If you believe that a given gene product regulates a process, there should be evidence that the MF of the gene product being annotated ''regulates'' an MF of a gene that is ''involved in'' the process.
*Mutant phenotypes used to annotate to a ‘regulation’ BP term with no corresponding Molecular Function annotation to the gene/gene product should be examined closely and reviewed.
**In some cases, for example macromolecular complexes with well defined roles in a BP, the MFs of individual complex members may not be known.  In these cases, it is acceptable to annotate each complex member to a BP regulation term if the overall MF of the complex is ''part of'' a regulatory process that affects the BP.


[[Category: Annotation]] [[Category:Working Groups]]
== A gene is not yet known to be ''involved in'' a process, but there is still a biologically meaningful annotation I want to capture ==
*Sometimes, despite the fact that the relation between a gene and a BP is not clear, a curator may wish to capture the association.
*This situation may occur when there is otherwise little else known about a gene.
*In these cases, curators may make a BP annotation based on a phenotype using the most general annotation relation, ''acts upstream of or within'', or one of its positive or negative effect child terms.
*If a gene is known to act upstream of a BP, curators may also consider using the ''acts upstream of'' relation or one of its positive or negative effect child terms.
**In these cases, though, it is important to think about how relevant and informative the resulting ''acts upstream of'' annotation would be for GO users.
**Consider making a GO-CAM model in this case to help delineate how closely 'upstream' the gene's activity really is.
 
== Reviewing and removing older phenotype-based annotations ==
*When there is new knowledge, older IMP annotation should be reviewed and removed if they no longer accurately reflect the relationship between a gene's MF and its role in a BP.
*GO annotations should reflect the most relevant knowledge about that gene, while phenotype annotations (not GO annotations) will persist in capturing what has been experimentally observed for perturbations in that gene.
 
==Examples when NOT to annotate a phenotype==
* PMID:26666268 + CHCHD10 shows that the expression of CHCHD10 mutant alleles inhibits apoptosis by preventing cytochrome c release
** The only valid annotation would be 'causally upstream of or within' apoptotic process; even that could be removed when more precise information about the gene function is found.
** DO NOT ANNOTATE GO:0090200 'positive regulation of release of cytochrome c from mitochondria'
 
= Case Studies =
 
= References =
== Pleiotropy ==
*[https://academic.oup.com/bib/article/17/1/13/2240567 The detection and characterization of pleiotropy: discovery, progress, and promise., Tyler AL, Crawford DC, Pendergrass SA., Brief Bioinform. 2016 Jan;17(1):13-22.]
 
= Review Status =
 
Last reviewed: December 18, 2018
 
 
[[Category: Annotation Guidelines]]

Revision as of 10:21, 9 October 2019

Introduction

Mutant phenotypes can provide important insights into gene function. In order to use mutant phenotypes as supporting evidence for a GO annotation, however, curators should keep in mind the following general principles:

  1. GO annotations based on a mutant phenotype should represent the normal role of a gene in biology.
  2. Mutant phenotypes need to be interpreted in the overall context of what is known about a gene so that appropriate relations and Biological Process terms can be selected; some phenotypes represent manifestations of genetic perturbations far downstream from the actual function of a gene.
  3. GO does not aim to capture individual phenotypes. Most groups that contribute annotations to GO also curate phenotypes as a separate data type, allowing GO curators to be selective about phenotypes used for annotation knowing that more complete and detailed phenotype curation is often captured elsewhere.

Relations Glossary

  • These guidelines for annotating from phenotypes refer to several relations from the Relations Ontology (RO).
  • A brief summary of how to use the relations cited in these guidelines, as well as links to more detailed GO annotation guidelines and the respective pages in Ontobee, are below.

Gene/Gene Product to GO Term Relations

  • involved in
    • involved in at Ontobee
    • Formal definition: C involved_in p if and only if c enables some process p', and p' is part of p
    • For any given Biological Process, think of the Molecular Functions that make up that process and using that information, decide if the gene/gene product being annotated enables one of those functions. If so, then the gene/gene product is involved in that process.
  • acts upstream of or within
    • acts upstream of or within at Ontobee
    • Formal definition: C acts upstream of or within p if c is enables f, and f is causally upstream of or within p. c is a material entity and p is an process.
    • The acts upstream of or within relation is used between a gene/gene product and a Biological Process when it is not known, mechanistically, how the Molecular Function of the gene/gene product affects the process.
  • acts upstream of
    • acts upstream of at Ontobee
    • Formal definition: C acts upstream of p if and only if c enables some f that is involved in p' and p' occurs chronologically before p, is not part of p, and affects the execution of p. c is a material entity and f, p, p' are processes.
    • The acts upstream of relation is used when the function of the gene/gene product is involved in a process that is causally upstream of another process.
  • Note: for the acts upstream of or within and acts upstream of relations, there are positive and negative child relations that can be used to capture the directionality of the effect, if known.

GO Term Relations

  • part of
    • Formal definition: A core relation that holds between a part and its whole
    • A Molecular Function is part of a Biological Process if execution of that MF is integral to the fulfillment of the BP.
  • regulates
    • Formal definition: Process(P1) regulates process(P2) iff: P1 results in the initiation or termination of P2 OR affects the frequency of its initiation or termination OR affects the magnitude or rate of output of P2.

Guidelines

Ask yourself: what is the normal Biological Process for this gene?

  • Remember that annotations based on mutant phenotypes are inferences about the normal Biological Process (BP) for a gene.
  • To decide if a gene is directly involved in a BP, determine if its Molecular Function (MF) is one of the MFs that makes up that BP.
    • To help with this, it may be useful to create an activity-based GO-CAM model in Noctua or consult a pathway or process model from a recent paper or review.
  • If the MF of a gene is not known, curators must be very careful about making a BP annotation based solely on a phenotype.
    • For genes with unknown MFs, in the absence of additional analysis (e.g. genetic and physical interactions), phenotypes can generally only be annotated to a BP using the acts upstream of or within relation (or one of its positive or negative effect child relations). In these cases, it is also perfectly acceptable to not make a GO annotation until more is known about the gene.

Being ‘required for’ a process does not necessarily mean a gene/gene product is involved in a process

  • In GO annotation, making a statement that a gene is involved in a process is an explicit statement that the MF of that gene is an integral part of the BP.
  • Although authors may state that a gene is 'required for' a process, GO curators must interpret those statements within the defined semantics of a GO annotation and decide if that means the gene's activity is truly a part of the BP.
  • If the MF is not part of the BP, consider using a less specific relation between the gene and the BP, or not annotating for BP until more information is known.

Pleiotropic effects need to be considered carefully

  • Pleiotropy, in which a single variation can result in multiple defects, needs to be carefully considered in the context of GO BP annotation.
  • An understanding of the molecular mechanism underlying pleiotropic defects can help determine if the gene is integrally involved in each process, or affects a process more generally required.

Beware of read-outs

  • Biologists often use experimental "read-outs" to assess a gene's role in a larger biological process.
  • The affected "read-out", however, is not always indicative of the process in which the gene is most proximally involved.
  • For example, DNA fragmentation is often used as a "read-out" for apoptosis, but mutations that result in increased DNA fragmentation do not necessarily indicate that the corresponding gene is directly involved in DNA catabolism.
  • Again, consider the MF of the mutated gene and how it fits into the BP used as a "read-out" as well as the overall BP being studied.

"High-level" phenotypes

  • Some papers may describe how genomic perturbations affect relatively 'high-level' processes. Examples of 'high-level' processes include:
    • Aging or determination of adult lifespan
    • Behavior
    • Cell death
    • Cell division
    • Cell migration
    • Cell proliferation
    • Development (embryonic, larval)
    • Growth
    • Locomotion
    • Protein localization
    • Reproduction and brood size
  • If the mechanism underlying the phenotypic effects on these processes is not known, we strongly encourage curators not to make annotations to the corresponding BP term.
    • Creating a GO-CAM model or consulting a recent paper or review can help determine the basis for the phenotype and whether the BP annotation is appropriate, and if so, with what relation.

Mutant phenotypes and regulation terms

  • The GO BP ontology contains terms for processes as well as regulation of processes.
  • From mutant phenotypes alone, it can be difficult to assess whether a gene is involved in a process or regulates a process.
  • Mutant phenotypes thus more often provide supporting, rather than definitive, evidence that a gene is involved in regulation of a process.
  • As with annotating to BP terms from phenotypes more generally, consider what is known about the MFs involved in the process and use that information to decide if annotating to the process or regulation of the process is best.
  • If you believe that a given gene product regulates a process, there should be evidence that the MF of the gene product being annotated regulates an MF of a gene that is involved in the process.
  • Mutant phenotypes used to annotate to a ‘regulation’ BP term with no corresponding Molecular Function annotation to the gene/gene product should be examined closely and reviewed.
    • In some cases, for example macromolecular complexes with well defined roles in a BP, the MFs of individual complex members may not be known. In these cases, it is acceptable to annotate each complex member to a BP regulation term if the overall MF of the complex is part of a regulatory process that affects the BP.

A gene is not yet known to be involved in a process, but there is still a biologically meaningful annotation I want to capture

  • Sometimes, despite the fact that the relation between a gene and a BP is not clear, a curator may wish to capture the association.
  • This situation may occur when there is otherwise little else known about a gene.
  • In these cases, curators may make a BP annotation based on a phenotype using the most general annotation relation, acts upstream of or within, or one of its positive or negative effect child terms.
  • If a gene is known to act upstream of a BP, curators may also consider using the acts upstream of relation or one of its positive or negative effect child terms.
    • In these cases, though, it is important to think about how relevant and informative the resulting acts upstream of annotation would be for GO users.
    • Consider making a GO-CAM model in this case to help delineate how closely 'upstream' the gene's activity really is.

Reviewing and removing older phenotype-based annotations

  • When there is new knowledge, older IMP annotation should be reviewed and removed if they no longer accurately reflect the relationship between a gene's MF and its role in a BP.
  • GO annotations should reflect the most relevant knowledge about that gene, while phenotype annotations (not GO annotations) will persist in capturing what has been experimentally observed for perturbations in that gene.

Examples when NOT to annotate a phenotype

  • PMID:26666268 + CHCHD10 shows that the expression of CHCHD10 mutant alleles inhibits apoptosis by preventing cytochrome c release
    • The only valid annotation would be 'causally upstream of or within' apoptotic process; even that could be removed when more precise information about the gene function is found.
    • DO NOT ANNOTATE GO:0090200 'positive regulation of release of cytochrome c from mitochondria'

Case Studies

References

Pleiotropy

Review Status

Last reviewed: December 18, 2018