Submit GO annotations

From GO Wiki
Jump to: navigation, search

UNDER REVIEW

Submitting GO annotations to the GO Consortium

GO annotations are evidence-based statements that link a Gene Ontology term to a particular gene product. Although most annotations are submitted and maintained by members of the GO Consortium, we accept bulk annotations from non-GOC members as well. For an overview on what GO annotations are and some of the components of an annotation, see the introduction to GO annotations.

GO annotations are currently disseminated in a 17 column tab-delimited GAF format file with strict formatting requirements; however, GO anticipates a move to the GPAD/GPI format soon and recommends while contributing groups can prepare a GAF, they also prepare or be ready to prepare a GPAD.

  • GAFs are recognizable by the *.gaf suffix as well as an internal header line denoting the format and version:  !gaf-version: 2.1 . The following information is intended to help new users create a GAF file; although you may choose to construct a GAF file solely using this documentation, it is highly recommended you contact GO for step-by-step assistance as failure to perform some steps will result in a completely unusable product.


Minimum information needed to make a GAF file:

  • Stable IDs for the gene products or objects that are being annotated.
If the gene product IDs are not in UniProt or NCBI, please submit the IDs to one of these databases before proceeding
  • GO ID that each gene product can be associated with
  • Evidence Code that allows you to make the association
  • Reference (published paper or reference describing the methodology used to make the gene product--GO term association)
  • NCBI taxon_id for the gene products for which the associations are made

General information required by the GOC

If you represent a group offering GO annotations, are you willing to serve as the species owner: do you have resources, support to maintain the annotations for the species, etc. This is strongly encouraged, as annotations must be maintained- that is, annotations are updated or removed as the ontology and/or scientific knowledge about the gene product evolves in order to keep the information relevant. For example, MGI is the species owner for mouse. If your group is willing and able to serve as the species owner, then you will need to provide us with some information about your group along with the GO annotations:

  • We will need to add your group to the groups.yaml file
  • If you are not using references to an database already in GO, we will need create an entry in the db-xrefs.yaml for the database/group. To complete these files, we'll need a contact email, the URL of the project, funding source (grant numbers, if applicable), and the URL of the GAF file (especially if the submission will be recurring/ongoing).
  • We'll also need a gp2protein file, another tab-delimited file that provides a mapping between database object IDs and protein sequence IDs
  • Data to complete a stanza for the GO.xrefs file (http://current.geneontology.org/metadata/GO.xrf_abbs). GOC will be happy to add this if you just supply us the details.
Example xrefs stanza:
 abbreviation: SGD
 database: Saccharomyces Genome Database
 object: Identifier for SGD Loci
 synonym: SGDID
 example_id: SGD:S000006169
 local_id_syntax: ^S[0-9]{9}$
 generic_url: http://www.yeastgenome.org/
 url_syntax: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=[example_id]
 url_example: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S000006169
  • If your contributing group is the sole species owner, then GO will need to add a taxon filter to the filtering script (implying you will provide all the annotations for that taxonID) and you should be able to absorb/integrate all external annotations into their GAF. This aspect should be brought up within the group.
  • If your group is unable to be the owner of the species, then you should submit your annotations to the GOA group. You can designate what the assigned_by column should say.
  • Also, if your group is submitting IEAs, it is important to consider how these IEAs compare with the IEAs made by GOA for the same proteins (how much value are they adding to the annotation set for that taxon ID?)

Steps to create a GAF

Steps to create GPAD

Resources

Evidence Codes:

File Specifications

References:

  • Published literature should be referred to by PMID if at all possible, see PubMed or Europe PMC depending on your institution's location

Internal notes (for GOC to set up new GAF groups)

  • db-xrefs.yaml- needs an entry for anything that links out of AmiGO. This file is also used (indirectly) for some QC/QA.
    • Add the stanza to the db-xrefs.yaml file and the webpage automatically picks it up for display on GO.xrf_abbs.
  • groups.yaml should also contain up-to-date information about what assigning groups exist in the world. This file is used by Noctua and some QC/QA tools.
  • users.yaml is only needed to access Noctua