Submit GO annotations: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(16 intermediate revisions by 2 users not shown)
Line 4: Line 4:
==Submitting GO annotations to the GO Consortium==
==Submitting GO annotations to the GO Consortium==


GO annotations are evidence-based statements that link a Gene Ontology term to a particular gene product.  Although most annotations are submitted and maintained by members of the GO Consortium, we accept annotations from non-GOC submitters as well. For an overview on what GO annotations are and some of the components of an annotation, see the [http://geneontology.org/docs/go-annotations/ introduction to GO annotations].
GO annotations are evidence-based statements that link a Gene Ontology term to a particular gene product.  Although most annotations are submitted and maintained by members of the GO Consortium, we accept bulk annotations from non-GOC members as well. For an overview on what GO annotations are and some of the components of an annotation, see the [http://geneontology.org/docs/go-annotations/ introduction to GO annotations].


GO annotations are disseminated in a 17 column tab-delimited [http://geneontology.org/docs/go-annotation-file-gaf-format-21/ GAF format] file with strict formatting requirements.  GAFs are recognizable by the <code>*.gaf</code> suffix as well as an internal header line denoting the format and version: <code> !gaf-version: 2.1 </code>. The following information is intended to help new users create a GAF file; although you may choose to construct a GAF file solely using this documentation, it is highly recommended you [http://help.geneontology.org/ contact GO] for step-by-step assistance as failure to perform some steps will result in a completely unusable product.
GO annotations are currently disseminated in a 17 column tab-delimited [http://geneontology.org/docs/go-annotation-file-gaf-format-21/ GAF format] file with strict formatting requirements; however, GO anticipates a move to the [http://geneontology.org/docs/gene-product-association-data-gpad-format/ GPAD]/[http://geneontology.org/docs/gene-product-information-gpi-format/ GPI] format soon and recommends while contributing groups can prepare a GAF, they also prepare or be ready to prepare a GPAD.   
*GAFs are recognizable by the <code>*.gaf</code> suffix as well as an internal header line denoting the format and version: <code> !gaf-version: 2.1 </code>. The following information is intended to help new users create a GAF file; although you may choose to construct a GAF file solely using this documentation, it is highly recommended you [http://help.geneontology.org/ contact GO] for step-by-step assistance as failure to perform some steps will result in a completely unusable product.
<br>
<br>
Minimum information needed to make a GAF file:<br>
===Minimum information needed to make a GAF file:===
* Stable IDs for the gene products or objects that are being annotated.  
* Stable IDs for the gene products or objects that are being annotated.  
:'''If the gene product IDs are not in UniProt or NCBI, ''please submit the IDs to one of these databases before proceeding'''''
:'''If the gene product IDs are not in UniProt or NCBI, ''please submit the IDs to one of these databases before proceeding'''''
* [http://amigo.geneontology.org/amigo/dd_browse GO ID] that each gene product can be associated with
* [http://amigo.geneontology.org/amigo/dd_browse GO ID] that each gene product can be associated with
* [http://geneontology.org/docs/guide-go-evidence-codes/ Evidence Code] that allows you to make the association
* [http://geneontology.org/docs/guide-go-evidence-codes/ Evidence Code] that allows you to make the association
* Reference (published paper or reference describing the methodology used to make the geneproductToGOterm association)
* Reference (published paper or reference describing the methodology used to make the gene product--GO term association)
* [https://www.ncbi.nlm.nih.gov/taxonomy NCBI taxon_id] for the gene products for which the associations are made
* [https://www.ncbi.nlm.nih.gov/taxonomy NCBI taxon_id] for the gene products for which the associations are made


===General information required by the GOC===
===General information required by the GOC===
If you represent a group offering GO annotations, are you willing to serve as the species owner: do you have resources, support to maintain the annotations for the species, etc.  This is strongly encouraged, as annotations must be maintained- that is, annotations are updated or removed as the ontology and/or scientific knowledge about the gene product evolves in order to keep the information relevant. For example, [http://www.informatics.jax.org/ MGI] is the species owner for mouse. If your group ''is'' willing and able to serve as the species owner, then you will need to provide us with some information about your group along with the GO annotations:
If you represent a group offering GO annotations, are you willing to serve as the species owner: do you have resources, support to maintain the annotations for the species, etc.  This is strongly encouraged, as annotations must be maintained- that is, annotations are updated or removed as the ontology and/or scientific knowledge about the gene product evolves in order to keep the information relevant. For example, [http://www.informatics.jax.org/ MGI] is the species owner for mouse. If your group ''is'' willing and able to serve as the species owner, then you will need to provide us with some information about your group along with the GO annotations:
* We will need to add your group to the groups.yaml file, create an entry in & add a contact email/user to users.yaml, and create a yaml for the database/group.  To make these files, we'll need a contact email, the URL of the project, funding source (grant numbers, if applicable), and the URL of the GAF file.
* We will need to add your group to the <code>groups.yaml</code> file  
:''note for GO staff: Make sure the contact email address in the config file is the same as in the GAF file''
* If you are not using references to an database already in GO, we will need create an entry in the <code>db-xrefs.yaml</code> for the database/group.  To complete these files, we'll need a contact email, the URL of the project, funding source (grant numbers, if applicable), and the URL of the GAF file (especially if the submission will be recurring/ongoing).
* We'll also need a [[Gp2protein_file | gp2protein file]], another tab-delimited file that provides a mapping between database object IDs and protein sequence IDs
* We'll also need a [[Gp2protein_file | gp2protein file]], another tab-delimited file that provides a mapping between database object IDs and protein sequence IDs
* A stanza for the GO.xrefs file (http://current.geneontology.org/metadata/GO.xrf_abbs). Add the stanza to the GO.xrf_abbs file and the webpage automatically picks it up for display. GOC will be happy to add this if you just supply us the details.  
* Data to complete a stanza for the GO.xrefs file (http://current.geneontology.org/metadata/GO.xrf_abbs). GOC will be happy to add this if you just supply us the details.  
:Example xrefs stanza:
:Example xrefs stanza:
  <pre> abbreviation: SGD
  <pre> abbreviation: SGD
Line 32: Line 33:
  url_syntax: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=[example_id]
  url_syntax: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=[example_id]
  url_example: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S000006169
  url_example: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S000006169
</pre>
:Example Config file:
<pre>
project_name=Saccharomyces Genome Database (SGD)
contact_email=sgd-helpdesk@lists.stanford.edu
project_url=http://www.yeastgenome.org/
funding_source=NHGRI of US National Institutes of Health, HG001315
email_report=sgd-go-curator@genome.stanford.edu
</pre>
</pre>


Line 47: Line 40:


* Also, if your group is submitting IEAs, it is important to consider how these IEAs compare with the IEAs made by GOA for the same proteins (how much value are they adding to the annotation set for that taxon ID?)
* Also, if your group is submitting IEAs, it is important to consider how these IEAs compare with the IEAs made by GOA for the same proteins (how much value are they adding to the annotation set for that taxon ID?)
==Steps to create a GAF==
==Steps to create GPAD==


==Resources==
==Resources==
Line 52: Line 50:
*For an overview, see our [http://geneontology.org/docs/guide-go-evidence-codes/ Evidence Code summary page].  
*For an overview, see our [http://geneontology.org/docs/guide-go-evidence-codes/ Evidence Code summary page].  
*For full details, see the [http://www.evidenceontology.org/ ECO website]
*For full details, see the [http://www.evidenceontology.org/ ECO website]
File Specifications
*See the full specs for [http://geneontology.org/docs/go-annotation-file-gaf-format-21/ GAF 2.1] or older [http://geneontology.org/docs/go-annotation-file-gaf-format-20/ GAF 2.0]
References:
*Published literature should be referred to by PMID if at all possible, see [https://www.ncbi.nlm.nih.gov/pubmed/ PubMed] or [https://europepmc.org/ Europe PMC] depending on your institution's location
==Internal notes (for GOC to set up new GAF groups)==
*db-xrefs.yaml- needs an entry for anything that links out of AmiGO. This file is also used (indirectly) for some QC/QA.
**Add the stanza to the db-xrefs.yaml file and the webpage automatically picks it up for display on GO.xrf_abbs.
*groups.yaml should also contain up-to-date information about what assigning groups exist in the world. This file is used by Noctua and some QC/QA tools.
*users.yaml is only needed to access Noctua

Latest revision as of 02:22, 11 April 2019

UNDER REVIEW

Submitting GO annotations to the GO Consortium

GO annotations are evidence-based statements that link a Gene Ontology term to a particular gene product. Although most annotations are submitted and maintained by members of the GO Consortium, we accept bulk annotations from non-GOC members as well. For an overview on what GO annotations are and some of the components of an annotation, see the introduction to GO annotations.

GO annotations are currently disseminated in a 17 column tab-delimited GAF format file with strict formatting requirements; however, GO anticipates a move to the GPAD/GPI format soon and recommends while contributing groups can prepare a GAF, they also prepare or be ready to prepare a GPAD.

  • GAFs are recognizable by the *.gaf suffix as well as an internal header line denoting the format and version: !gaf-version: 2.1 . The following information is intended to help new users create a GAF file; although you may choose to construct a GAF file solely using this documentation, it is highly recommended you contact GO for step-by-step assistance as failure to perform some steps will result in a completely unusable product.


Minimum information needed to make a GAF file:

  • Stable IDs for the gene products or objects that are being annotated.
If the gene product IDs are not in UniProt or NCBI, please submit the IDs to one of these databases before proceeding
  • GO ID that each gene product can be associated with
  • Evidence Code that allows you to make the association
  • Reference (published paper or reference describing the methodology used to make the gene product--GO term association)
  • NCBI taxon_id for the gene products for which the associations are made

General information required by the GOC

If you represent a group offering GO annotations, are you willing to serve as the species owner: do you have resources, support to maintain the annotations for the species, etc. This is strongly encouraged, as annotations must be maintained- that is, annotations are updated or removed as the ontology and/or scientific knowledge about the gene product evolves in order to keep the information relevant. For example, MGI is the species owner for mouse. If your group is willing and able to serve as the species owner, then you will need to provide us with some information about your group along with the GO annotations:

  • We will need to add your group to the groups.yaml file
  • If you are not using references to an database already in GO, we will need create an entry in the db-xrefs.yaml for the database/group. To complete these files, we'll need a contact email, the URL of the project, funding source (grant numbers, if applicable), and the URL of the GAF file (especially if the submission will be recurring/ongoing).
  • We'll also need a gp2protein file, another tab-delimited file that provides a mapping between database object IDs and protein sequence IDs
  • Data to complete a stanza for the GO.xrefs file (http://current.geneontology.org/metadata/GO.xrf_abbs). GOC will be happy to add this if you just supply us the details.
Example xrefs stanza:
 abbreviation: SGD
 database: Saccharomyces Genome Database
 object: Identifier for SGD Loci
 synonym: SGDID
 example_id: SGD:S000006169
 local_id_syntax: ^S[0-9]{9}$
 generic_url: http://www.yeastgenome.org/
 url_syntax: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=[example_id]
 url_example: http://db.yeastgenome.org/cgi-bin/locus.pl?dbid=S000006169
  • If your contributing group is the sole species owner, then GO will need to add a taxon filter to the filtering script (implying you will provide all the annotations for that taxonID) and you should be able to absorb/integrate all external annotations into their GAF. This aspect should be brought up within the group.
  • If your group is unable to be the owner of the species, then you should submit your annotations to the GOA group. You can designate what the assigned_by column should say.
  • Also, if your group is submitting IEAs, it is important to consider how these IEAs compare with the IEAs made by GOA for the same proteins (how much value are they adding to the annotation set for that taxon ID?)

Steps to create a GAF

Steps to create GPAD

Resources

Evidence Codes:

File Specifications

References:

  • Published literature should be referred to by PMID if at all possible, see PubMed or Europe PMC depending on your institution's location

Internal notes (for GOC to set up new GAF groups)

  • db-xrefs.yaml- needs an entry for anything that links out of AmiGO. This file is also used (indirectly) for some QC/QA.
    • Add the stanza to the db-xrefs.yaml file and the webpage automatically picks it up for display on GO.xrf_abbs.
  • groups.yaml should also contain up-to-date information about what assigning groups exist in the world. This file is used by Noctua and some QC/QA tools.
  • users.yaml is only needed to access Noctua