Submit GO annotations

From GO Wiki
Revision as of 02:15, 11 April 2019 by Pascale (talk | contribs)
Jump to navigation Jump to search

UNDER REVIEW

Submitting GO annotations to the GO Consortium

GO annotations are evidence-based statements that link a Gene Ontology term to a particular gene product. Although most annotations are submitted and maintained by members of the GO Consortium, we accept bulk annotations from non-GOC members as well. For an overview on what GO annotations are and some of the components of an annotation, see the introduction to GO annotations.

GO annotations are currently disseminated in a 17 column tab-delimited GAF format file with strict formatting requirements; however, GO anticipates a move to the GPAD/GPI format soon and recommends while contributing groups can prepare a GAF, they also prepare or be ready to prepare a GPAD.

  • GAFs are recognizable by the *.gaf suffix as well as an internal header line denoting the format and version: !gaf-version: 2.1 . The following information is intended to help new users create a GAF file; although you may choose to construct a GAF file solely using this documentation, it is highly recommended you contact GO for step-by-step assistance as failure to perform some steps will result in a completely unusable product.


Minimum information needed to make a GAF file:

  • Stable IDs for the gene products or objects that are being annotated.
If the gene product IDs are not in UniProt or NCBI, please submit the IDs to one of these databases before proceeding
  • GO ID that each gene product can be associated with
  • Evidence Code that allows you to make the association
  • Reference (published paper or reference describing the methodology used to make the gene product--GO term association)
  • NCBI taxon_id for the gene products for which the associations are made

General information required by the GOC

If you represent a group offering GO annotations, are you willing to serve as the species owner: do you have resources, support to maintain the annotations for the species, etc. This is strongly encouraged, as annotations must be maintained- that is, annotations are updated or removed as the ontology and/or scientific knowledge about the gene product evolves in order to keep the information relevant. For example, MGI is the species owner for mouse. If your group is willing and able to serve as the species owner, then you will need to provide us with some information about your group along with the GO annotations:

  • We will need to add your group to the groups.yaml file
  • If you are not using references to an database already in GO, we will need create an entry in the db-xrefs.yaml for the database/group. To complete these files, we'll need a contact email, the URL of the project, funding source (grant numbers, if applicable), and the URL of the GAF file (especially if the submission will be recurring/ongoing).
  • We'll also need a gp2protein file, another tab-delimited file that provides a mapping between database object IDs and protein sequence IDs
  • Data to complete a stanza for the GO.xrefs file (http://current.geneontology.org/metadata/GO.xrf_abbs). GOC will be happy to add this if you just supply us the details.
Example xrefs stanza:
 abbreviation: SGD
 database: Saccharomyces Genome Database
 object: Identifier for SGD Loci
 synonym: SGDID
 example_id: SGD:S000006169
 local_id_syntax: ^S[0-9]{9}$
 generic_url: http://www.yeastgenome.org/
 url_syntax: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=[example_id]
 url_example: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S000006169
  • If your contributing group is the sole species owner, then GO will need to add a taxon filter to the filtering script (implying you will provide all the annotations for that taxonID) and you should be able to absorb/integrate all external annotations into their GAF. This aspect should be brought up within the group.
  • If your group is unable to be the owner of the species, then you should submit your annotations to the GOA group. You can designate what the assigned_by column should say.
  • Also, if your group is submitting IEAs, it is important to consider how these IEAs compare with the IEAs made by GOA for the same proteins (how much value are they adding to the annotation set for that taxon ID?)

Steps to create a GAF

Steps to create GPAD

Resources

Evidence Codes:

File Specifications

References:

  • Published literature should be referred to by PMID if at all possible, see PubMed or Europe PMC depending on your institution's location

Internal notes (for GOC to set up new GAF groups)

  • db-xrefs.yaml- needs an entry for anything that links out of AmiGO. This file is also used (indirectly) for some QC/QA.
    • Add the stanza to the db-xrefs.yaml file and the webpage automatically picks it up for display on GO.xrf_abbs.
  • groups.yaml should also contain up-to-date information about what assigning groups exist in the world. This file is used by Noctua and some QC/QA tools.
  • users.yaml is only needed to access Noctua


  The next part is moved from [Tips_to_Produce_High_Quality_Annotations] - we need to see whether any of it is useful. 


Annotating new organisms

If you work on a previously unannotated organism, or your research group has a specific research expertise that could be used to produce GO annotations:

  • [Contact the GOC](http://help.geneontology.org/) to discuss the best approach for your annotations and to ensure you are the only group working on your organism. If you would be interested in taking ownership for an organism with outdated annotations, we can help you find the right people to contact as well.
  • Training of new curators will be arranged, if needed, with an existing GOC mentor.
  • A representative of your group will need to [join GitHub](/docs/how-to-submit-requests/) in order to maintain your group's annotations. Once a representative is designated, the GOC will also generate internal files needed to submit your annotations to GO.

Not enough annotations to justify joining GO?

  • Submit one or just a few manual annotations by adding a new issue on the [GOC GitHub Annotation Tracker](https://github.com/geneontology/go-annotation/issues). Each of your annotations should include at least one key literature reference (PMID) in support of your assertions. Please state whether or not regular updates will be submitted about this annotation.

Automated annotations

If your group is interested in generating a large number of automated/electronic annotations, please be aware that InterPro2GO is the only source of [IEAs, Inferred from Electronic Annotation](http://wiki.geneontology.org/index.php/Inferred_from_Electronic_Annotation_(IEA)) recognized by the GOC. Submit your transcripts or other data to UniProt, and they will automatically generate IEAs from your data. Once your organism is in UniProt, [contact the GOC](http://help.geneontology.org/) and we will gladly assist in curator training so your group can add manual annotations as well.

Reviewing GO annotations associated with a scientific article

Literature annotation involves capturing published information about the exact function of a gene product as a GO annotations. This curation process is time-consuming but produces very high quality, species-specific annotation; the accuracy and uniform format of annotations allows the information to be used in high-throughput experiments. GO curation may be best carried out by people who know the function of the gene product and the associated biology in great detail- for example, experimental scientists who are familiar with the published literature. If you are an expert in a gene product or a particular field, then you may like to [suggest modifications to the ontology structure](/docs/contributing-to-go-terms/) as well.

Below is a schematic diagram giving an introduction to the steps involved in literature-based GO annotation. http://geneontology.org/sites/default/files/public/diag-literature-annot.png

To begin, check if there are existing annotations to the paper: open a Gene Ontology browser, (e.g. [AmiGO](http://amigo.geneontology.org/amigo), [QuickGO](https://www.ebi.ac.uk/QuickGO/)) and enter a PubMed identifier (PMID) for the paper of interest in the 'Search' field.

If GO annotations are listed in the results:

  1. Check whether the paper has been annotated by GO curators.
  2. Click on the PMID and browse annotations associated with the paper.
    • If you agree that the annotations accurately represent the data, you are done!
    • If you think the annotations could be improved: Write a new issue on the 'GOC GitHub Annotation Tracker', indicating that these annotations should be reviewed. Include:
      • a PMID
      • the name of the species investigated in the experiment that led to this publication
      • Please state whether or not regular updates will be submitted about this annotation.

If no results are listed using this PMID:

This means the paper has not been annotated by GO curators.

  • Write a new issue on the 'GOC GitHub Annotation Tracker', indicating that this is a new annotation. Include:
    • a PMID
    • the name of the species investigated in the experiment that led to this publication
    • Please state whether or not regular updates will be submitted about this annotation.

Reviewing GO annotations for a gene or protein:

To start, check if there are existing annotations to the gene or protein of interest: open a Gene Ontology browser (e.g. AmiGO, QuickGO) and search for the gene or gene protein record of interest by entering it in the 'Search' field, then browse associated annotations and follow links to see the full list of annotations: