Long-term maintenance of annotation datasets
Groups contributing annotations to the central GO repository are responsible for maintaining their files up to date.
Avoiding annotation redundancy
Where two or more databases are submitting data on the same species we encourage the model whereby one database group collects all annotation data for that species, removes the redundant (duplicate) annotations, and then submits the total dataset to the central repository. This ensures that no redundant annotations will appear in the master dataset. Please see the list of species and relevant database groups for more details. We understand that annotating groups will also wish to make their full dataset available to the public. For this purpose, the GO Consortium makes all of the individual datasets available from the GO website, via the GO web CVS interface, or from the directory go/gene-associations/ in the GO CVS repository. All of the individual datasets are also listed in the annotation downloads table, and all individual groups will clearly be given credit for the work that they have done. The non-redundant set is only used as the master copy that appears in AmiGO and similar tools.
Procedure to deal with unmaintained annotations
- GOC becomes responsible for maintaining annotations for groups that are no longer able to do so. This means that:
- GOC/EBI runs QC checks
- GOC becomes an editor of the annotations
- Annotations are rolled into the main GAF file(s)
Group whose annotations are not actively maintained
- DFLAT, JCVI(*), MENGO(*), MTBBASE, PAMGO(*), PINC, TIGR(*)
- Annotations are maintained in Protein2GO
- (*) These groups still have persons that can be contacted with questions about changes in annotations, see GO Group Contacts on GitHub.
Last reviewed: April 11, 2019