Planned changes to the IEA GO REFs cited by UniProtKB.

From GO Wiki
Jump to navigation Jump to search

Scheduled change to the GO_REF references applied by UniProtKB Keyword and UniProtKB Subcellular Location 2 GO annotations

We would like to alert users to an intended change in the references supplied in annotations from two UniProtKB automatic annotation pipelines.

Annotations created from mappings between GO and the UniProtKB keywords and UniProtKB Subcellular Location controlled vocabularies currently cite the GO reference GO_REF:0000004 and GO_REF:0000023, respectively.

However, terms from these UniProtKB controlled vocabularies are applied differently to UniProt Swiss-Prot and TrEMBL entries;

  • UniProtKB terms are manually annotated to UniProtKB/Swiss-Prot entries
  • UniProtKB/TrEMBL entries are annotated from data supplied by the underlying nucleic acid databases and/or by the UniProt automatic annotation program.

We intend to change the cited references in the supplied GO annotations to highlight these differences and use these in future UniProt-GOA releases.

Therefore, from the February 2012 release onwards, the UniProt-GOA annotation set will use: GO_REF:0000037 or GO_REF:0000038 instead of GO_REF:0000004 for UniProtKB keyword annotations GO_REF:0000039 or GO_REF:0000040 instead of GO_REF:0000023 for UniProtKB Subcellular Location annotations

Further descriptions all of these references are available at: http://www.geneontology.org/cgi-bin/references.cgi

However, GO_REF:0000004 and GO_REF:000023 have been updated and will continue to be available to use where groups are carrying out their own UniProtKB keyword or UniProtKB Subcellular Location 2 GO mapping

Change intended to spkw2go and spsl2go mapping file names

Currently there are external2go mapping files dealing with UniProt vocabularies: spkw2go and spsl2go.

However, these vocabularies are applied to both reviewed (Swiss-Prot) and unreviewed (TrEMBL) UniProt entries, therefore the names of these vocabularies is now: UniProtKB Keywords and UniProt Subcelluar Location.

We would like to update the names of these files, so that they correspond to other updated documentation. Therefore we would intend to change:

FROM:

http://www.geneontology.org/external2go/spkw2go

http://www.geneontology.org/external2go/spsl2go


TO:

http://www.geneontology.org/external2go/uniprotkb_kw2go

http://www.geneontology.org/external2go/uniprotkb_sl2go

  • If useful, we can create symbolic links on the GO/GOA ftp sites between the old mapping filenames to direct to the new, so that annotation groups' scripts do not break.

New GO_REFs to be created and used when UniProt-GOA has changed an IEA annotation from one of the InterPro2GO, HAMAP2GO, UniProtkw2go or UniProt SPSL2GO mappings

GOA-UniProt is intending to carry out conservative deletions/transformations of GO annotations on the automatic annotations supplied by Ensembl, InterPro and UniProt groups to the UniProtKB GO annotation set.

Affected annotations will either be individually filtered out of the UniProt-GOA annotation set or conservatively changed to use an equivalent, correct GO term.

Changes to particular annotation subsets will only be made when the annotation originally supplied by the automatic annotation pipeline is incorrect for a UniProtKB protein, when UniProtKB is the group primarily responsible for supplying the data to the GO Consortium and the annotation cannot be easily fixed by the annotation-contributing group without an unnecessarily high loss of correct annotations.

In many cases UniProt-GOA is best placed to make changes to improve UniProt IEA annotations as it is able to consider taxonomic correctness of the annotated GO terms (e.g. for InterPro2GO annotations) and specifc GO ontology requirements.

All of the changes to IEA annotation sets will be need to agreed to by the affected automatic annotation groups.

GO_REF changes for UniProt-GOA IEA altered annotations

All automatic annotations that are transformed by the UniProtKB-GOA processing will reference a GO_REF that indicates to the user that UniProt-GOA has changed the annotations GO term and which points users to a UniProt-GOA page that fully describes all of the rules for changing annotation sets.

It is important that users know why annotations in UniProt-GOA differ from the that could be created by directly using an external2go mapping file directly over UniProtKB.

These GO_REFs have not yet been created.

Example:

An example of a desirable annotation transformation would be in response to the GO taxon rule that states a gene product can only be annotated to the term 'nucleus' if it is from the Eukaryotic kingdom.

If a viral gene product has been annotated to nucleus from a taxon-independent annotation method - such as InterPro2GO, we would like to transform the GO term used to: 'GO:00742025 host cell nucleus', rather than loose the InterPro2GO annotation.

Schedule

Prudence is currently finalizing the rules to correct viral gp annotations. Once done, each of the affected IEA supplying groups will be consulted as to the changes we would make to their annotations and GO_REFs and webpages will be published.

As few GOC groups integrate viral GO annotations, this first phase will not affect many annotation groups, however in future further IEA annotation corrections may impact on MOD species IEA annotation sets.