Tips to Produce High Quality Annotations: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(42 intermediate revisions by 3 users not shown)
Line 1: Line 1:
= Tips to Produce High Quality Annotations =


  See also http://wiki.geneontology.org/index.php/Annotation_conventions
== Get the wider perspective ==
* '''Favor a pathway-by-pathway or gene product-by-gene product approach''' for curation, rather than paper-by-paper.
** GO annotations should be made considering the overall context in which a gene product functions, which may not be apparent from just looking at individual papers
* '''Read recent publications, including recent reviews if available.'''
** Recent reviews can also be helpful for understanding the overall pathway or process you are annotating.
* '''Look at existing annotations''' for the gene product and for the term you have chosen to annotate, to ensure consistency.
** This may trigger the need to revise other annotations.
* '''Remove incorrect annotations based on invalidated hypotheses.'''
** If potentially incorrect annotations have been made by another group, create an issue in the [https://github.com/geneontology/go-annotation/issues go-annotation tracker].


== Get the Wider Perspective ==
== Focus on the research hypothesis ==
* Favor a gene-by-gene or pathway-by-pathway approach for curation rather than paper-by-paper
* '''Use existing knowledge''' to understand the hypothesis being tested and its relation to the experimental observation.
* Read recent publications
* Look at existing annotations for the same protein and for the term you have chosen to annotate, to ensure consistency. This may trigger the need to revise other annotations
* Remove incorrect annotations based on invalidated hypothesis


== Focus on the Research Hypothesis ==
== Capture the conclusion, not the assay ==
Use prior knowledge to understand the hypothesis being tested and its relation to the experimental observation.  
=== Make sure to distinguish between assays and GO terms===
* Some assays can readily seem to correspond to GO terms. Be careful when interpreting the results of these assays as they may not reflect the actual role of the gene product in a process.
* '''Examples''':
** <code>GO:0006309 apoptotic DNA fragmentation</code>
** <code>GO:0006919 activation of cysteine-type endopeptidase activity involved in apoptotic process</code> (AKA caspase activation)
*** These assays are often used to measure whether apoptosis occurred, but NOT to measure a specific, direct role of a gene product in that process.
** <code>GO:0042060 wound healing</code> it is incorrect to use this term for wound healing assay experiments, which is an assay for 'GO:0048870' cell motility', and NOT wound healing.


== Capture the Conclusion, not the Assay ==
=== Some assay conditions test general protein properties, not their function===
'''Examples''':
* '''1. Requirement for a post-translational modification for activity''': PMID:16500043, Fig 3. DNAse II is a glycosylated protein, and the glycosylation is presumably necessary for the correct protein conformation - the protein is not active when glycosylation is less active, for example with tunicamycin.
** Hints that tunicamycin does not regulate the activity of DNAse II:
*** The paper doesn't mention that DNAse II binds tunicamycin (a direct interaction is necessary for a regulation annotation)
*** The paper indirectly shows that in the presence of tunicamycin is not glycosylated (it has a lower molecular weight), and this lower molecular weight form is less active.
** The conclusion of the experiment is that DNAse II is glycosylated. Since glycosylated is not known to be regulated (unlike phosphorylation), it is unlikely that this data supports any kind of regulation. The effect of tunicamycin is outside the scope of GO. The only annotation that can be made from Fig. 3 is <code>GO:0004531 deoxyribonuclease II activity</code>.
* '''2. Impact of chemicals that affect the cellular environment on an activity''': PMID:19690162, Fig. 3. Trx contains a conserved active site with two cysteines that are essential for its redox activity. The authors investigated whether this site was necessary for interaction with SlrP by treating the cells with hydrogen peroxide. The interaction was reduced. In the discussion the authors suggest that the reduced form of Trx (ie in the presence of H2O2) doesn't have the correct conformation for interaction with SlrP ("The residues necessary for interaction with SlrP could be buried in the dimeric form of Trx.".
** Hints that hydrogen peroxide does not regulate Trx/SlrP binding:
*** based on experimental data: hydrogen peroxide is not shown to directly interact with either interacting partner
*** based on biological knowledge: the likely effect of hydrogen peroxide is the reduction of the disulfide bond of Trx, which likely affects its conformation.


== Use Caution when Inferring Normal Functions Based on Phenotypes ==
== Annotate gene products to their normal function ==
Phenotypes can help understand the function of proteins, but also provide insights into mechanisms leading to disease
* '''The scope of the GO is to capture the normal function of gene products''', so special care must be taken to understand how experimental observations inform this. 
The scope of the GO, though, is to capture the normal function of proteins, so special care must be taken to understand how a phenotypic observation helps understand the nomal function of a protein.
* Mutant phenotypes are an important sources of GO annotations, but curators should consider carefully how to use that information for annotation.
* '''Specific guidelines for annotating mutant phenotypes are available''' [http://wiki.geneontology.org/index.php/Annotating_from_phenotypes here].


Beware of indirect effects of mutations.  
== Check term placement in the ontology ==
- Housekeeping genes, such as RNA polymerase, affects essentially all cellular processes (cell proliferation, development, etc) but does not *mediate* these processes.  
'''Make sure that the parents of the term are also consistent with the annotation.''' For example
'GO:0015616 DNA translocase activity' is a 'GO:0008094 DNA-dependent ATPase activity', although just by looking at the term name one cannot tell that DNA translocase is a type of ATPase.


Phenotypes not supported by a molecular role for the protein.  
== Represent current knowledge ==
- Knockout/knock downs may result in pleiotropic effects on cell biology, development, etc. Without understanding the molecular mechanis, be careful no to make annotations to Biological Processes terms that are more specific than the experiment allows to conclude.
'''The GO is not an archive or all findings published on a protein.''' Do no hesitate to remove older annotations that are inconsistent with the current state of knowledge for a protein' role.


== Check Term Placement in the Ontology ==
== Annotations from large-scale datasets==
 
The term high-throughput data is often used to describe data that has been generated by automatic or semi-automatic methodology without validation of the results for individual gene products. The experiments can be viewed as screens: experiments performed in parallel without explicit target selection; they are generally not hypothesis-driven. More details as to how to annotate high throughput data can be found in the [[Guide_to_GO_Evidence_Codes#High_Throughput_Experimental_Evidence_Codes]] page.
== Represent Current Knowledge ==
The GO is not an archive or all findings published on a protein. Do no hesitate to remove older annotations that are inconsistent with the current state of knowledge for a protein' role.  


==Avoiding predatory journals==
==Avoiding predatory journals==
Line 35: Line 55:
* https://www.the-scientist.com/news-opinion/german-scientists-frequently-publish-in-predatory-journals-64518
* https://www.the-scientist.com/news-opinion/german-scientists-frequently-publish-in-predatory-journals-64518
* https://www.the-scientist.com/news-opinion/indian-government-aims-to-take-down-predatory-journals-64731?utm_campaign=TS_DAILY%20NEWSLETTER_2018&utm_source=hs_email&utm_medium=email&utm_content=65569194&_hsenc=p2ANqtz-8vKr7yMcdVq-SddM-VUDuhEkiuw_GUGkhM8JomWp1adoKTdafscdN7dP2Y-PP2zwhFVC3e0zD9SSshnGzZ6T9hyOHOCQ&_hsmi=65569194
* https://www.the-scientist.com/news-opinion/indian-government-aims-to-take-down-predatory-journals-64731?utm_campaign=TS_DAILY%20NEWSLETTER_2018&utm_source=hs_email&utm_medium=email&utm_content=65569194&_hsenc=p2ANqtz-8vKr7yMcdVq-SddM-VUDuhEkiuw_GUGkhM8JomWp1adoKTdafscdN7dP2Y-PP2zwhFVC3e0zD9SSshnGzZ6T9hyOHOCQ&_hsmi=65569194
* DEF CON 26 - Svea, Suggy, Till - Inside the Fake Science Factory https://www.youtube.com/watch?v=ras_VYgA77Q


== Review Status ==


  From http://geneontology.org/page/annotations-from-article
Last reviewed: April 11, 2019
  http://geneontology.org/page/annotations-gene-prot
  http://geneontology.org/contribute-large-dataset
 
== Annotations from large-scale datasets==
 
If you work on a previously unannotated organism, or your research group has a specific research expertise that could be used to produce GO annotations:
* [Contact the GOC](http://help.geneontology.org/) to discuss the best approach for your annotations and to ensure you are the only group working on your organism.  If you would be interested in taking ownership for an organism with outdated annotations, we can help you find the right people to contact as well.
* Training of new curators will be arranged, if needed, with an existing GOC mentor.
* A representative of your group will need to [join GitHub](/docs/how-to-submit-requests/) in order to maintain your group's annotations.  Once a representative is designated, the GOC will also generate internal files needed to submit your annotations to GO.
 
==  Not enough annotations to justify joining GO?==
* Submit one or just a few manual annotations by adding a new issue on the [GOC GitHub Annotation Tracker](https://github.com/geneontology/go-annotation/issues). Each of your annotations should include at least one key literature reference (PMID) in support of your assertions. Please state whether or not regular updates will be submitted about this annotation.
 
==  Automated Annotations==
If your group is interested in generating a large number of automated/electronic annotations, please be aware that InterPro2GO is the only source of [IEAs, Inferred from Electronic Annotation](http://wiki.geneontology.org/index.php/Inferred_from_Electronic_Annotation_(IEA)) recognized by the GOC.  Submit your transcripts or other data to UniProt, and they will automatically generate IEAs from your data.  Once your organism is in UniProt, [contact the GOC](http://help.geneontology.org/) and we will gladly assist in curator training so your group can add manual annotations as well.
 
== Reviewing GO annotations associated with a scientific article== 
Literature annotation involves capturing published information about the exact function of a gene product as a GO annotations. This curation process is time-consuming but produces very high quality, species-specific annotation; the accuracy and uniform format of annotations allows the information to be used in high-throughput experiments. GO curation may be best carried out by people who know the function of the gene product and the associated biology in great detail- for example, experimental scientists who are familiar with the published literature. If you are an expert in a gene product or a particular field, then you may like to [suggest modifications to the ontology structure](/docs/contributing-to-go-terms/) as well.
 
Below is a schematic diagram giving an introduction to the steps involved in literature-based GO annotation.
http://geneontology.org/sites/default/files/public/diag-literature-annot.png
 
To begin, check if there are existing annotations to the paper:  open a Gene Ontology browser, (e.g. [AmiGO](http://amigo.geneontology.org/amigo), [QuickGO](https://www.ebi.ac.uk/QuickGO/)) and enter a PubMed identifier (PMID) for the paper of interest in the 'Search' field.
 
=== If GO annotations are listed in the results:===
1. Check whether the paper has been annotated by GO curators.
2. Click on the PMID and browse annotations associated with the paper.
  * If you agree that the annotations accurately represent the data, you are done!
  * If you think the annotations could be improved: Write a new issue on the 'GOC GitHub Annotation Tracker', indicating that these annotations should be reviewed. Include:
  - [ ]  a PMID
  - [ ] the name of the species investigated in the experiment that led to this publication
  - [ ] *Please state whether or not regular updates will be submitted about this annotation*.
   
====  If no results are listed using this PMID:====
This means the paper has not been annotated by GO curators.
* Write a new issue on the 'GOC GitHub Annotation Tracker', indicating that this is a new annotation. Include:
** a PMID
**  the name of the species investigated in the experiment that led to this publication
**  '''Please state whether or not regular updates will be submitted about this annotation'''.
 
===  Reviewing GO annotations for a gene or protein:===
 
To start, check if there are existing annotations to the gene or protein of interest: open a Gene Ontology browser (e.g. AmiGO, QuickGO) and search for the gene or gene protein record of interest by entering it in the 'Search' field, then browse associated annotations and follow links to see the full list of annotations:
 




Back to: [[Annotation]]
Back to: [[Annotation]]


[[Category: Annotation Working Group]]
[[Category: Annotation Guidelines]]

Revision as of 19:33, 6 March 2020

Get the wider perspective

  • Favor a pathway-by-pathway or gene product-by-gene product approach for curation, rather than paper-by-paper.
    • GO annotations should be made considering the overall context in which a gene product functions, which may not be apparent from just looking at individual papers
  • Read recent publications, including recent reviews if available.
    • Recent reviews can also be helpful for understanding the overall pathway or process you are annotating.
  • Look at existing annotations for the gene product and for the term you have chosen to annotate, to ensure consistency.
    • This may trigger the need to revise other annotations.
  • Remove incorrect annotations based on invalidated hypotheses.
    • If potentially incorrect annotations have been made by another group, create an issue in the go-annotation tracker.

Focus on the research hypothesis

  • Use existing knowledge to understand the hypothesis being tested and its relation to the experimental observation.

Capture the conclusion, not the assay

Make sure to distinguish between assays and GO terms

  • Some assays can readily seem to correspond to GO terms. Be careful when interpreting the results of these assays as they may not reflect the actual role of the gene product in a process.
  • Examples:
    • GO:0006309 apoptotic DNA fragmentation
    • GO:0006919 activation of cysteine-type endopeptidase activity involved in apoptotic process (AKA caspase activation)
      • These assays are often used to measure whether apoptosis occurred, but NOT to measure a specific, direct role of a gene product in that process.
    • GO:0042060 wound healing it is incorrect to use this term for wound healing assay experiments, which is an assay for 'GO:0048870' cell motility', and NOT wound healing.

Some assay conditions test general protein properties, not their function

Examples:

  • 1. Requirement for a post-translational modification for activity: PMID:16500043, Fig 3. DNAse II is a glycosylated protein, and the glycosylation is presumably necessary for the correct protein conformation - the protein is not active when glycosylation is less active, for example with tunicamycin.
    • Hints that tunicamycin does not regulate the activity of DNAse II:
      • The paper doesn't mention that DNAse II binds tunicamycin (a direct interaction is necessary for a regulation annotation)
      • The paper indirectly shows that in the presence of tunicamycin is not glycosylated (it has a lower molecular weight), and this lower molecular weight form is less active.
    • The conclusion of the experiment is that DNAse II is glycosylated. Since glycosylated is not known to be regulated (unlike phosphorylation), it is unlikely that this data supports any kind of regulation. The effect of tunicamycin is outside the scope of GO. The only annotation that can be made from Fig. 3 is GO:0004531 deoxyribonuclease II activity.
  • 2. Impact of chemicals that affect the cellular environment on an activity: PMID:19690162, Fig. 3. Trx contains a conserved active site with two cysteines that are essential for its redox activity. The authors investigated whether this site was necessary for interaction with SlrP by treating the cells with hydrogen peroxide. The interaction was reduced. In the discussion the authors suggest that the reduced form of Trx (ie in the presence of H2O2) doesn't have the correct conformation for interaction with SlrP ("The residues necessary for interaction with SlrP could be buried in the dimeric form of Trx.".
    • Hints that hydrogen peroxide does not regulate Trx/SlrP binding:
      • based on experimental data: hydrogen peroxide is not shown to directly interact with either interacting partner
      • based on biological knowledge: the likely effect of hydrogen peroxide is the reduction of the disulfide bond of Trx, which likely affects its conformation.

Annotate gene products to their normal function

  • The scope of the GO is to capture the normal function of gene products, so special care must be taken to understand how experimental observations inform this.
  • Mutant phenotypes are an important sources of GO annotations, but curators should consider carefully how to use that information for annotation.
  • Specific guidelines for annotating mutant phenotypes are available here.

Check term placement in the ontology

Make sure that the parents of the term are also consistent with the annotation. For example 'GO:0015616 DNA translocase activity' is a 'GO:0008094 DNA-dependent ATPase activity', although just by looking at the term name one cannot tell that DNA translocase is a type of ATPase.

Represent current knowledge

The GO is not an archive or all findings published on a protein. Do no hesitate to remove older annotations that are inconsistent with the current state of knowledge for a protein' role.

Annotations from large-scale datasets

The term high-throughput data is often used to describe data that has been generated by automatic or semi-automatic methodology without validation of the results for individual gene products. The experiments can be viewed as screens: experiments performed in parallel without explicit target selection; they are generally not hypothesis-driven. More details as to how to annotate high throughput data can be found in the Guide_to_GO_Evidence_Codes#High_Throughput_Experimental_Evidence_Codes page.

Avoiding predatory journals

Make sure you select high quality papers. Some journals have been labeled 'Predatory journals' for their dubious practices with respect to the publication process. There is a list of these journals here: https://predatoryjournals.com/journals/. If you have a doubt this may hep decide whether or not to annotate the paper.

More information about predatory journals may be found in these articles:

Review Status

Last reviewed: April 11, 2019


Back to: Annotation