Mock-ups for GO website: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
Line 17: Line 17:
HAMAP is a system that categories sequences based on family or subfamily characteristics and is applied to bacterial, archaeal and plastid-encoded proteins. GO annotation can be automatically applied to such sequences using the mapping file between HAMAP and GO.  
HAMAP is a system that categories sequences based on family or subfamily characteristics and is applied to bacterial, archaeal and plastid-encoded proteins. GO annotation can be automatically applied to such sequences using the mapping file between HAMAP and GO.  


===EC===
===Enzyme Commission===


The Enzyme Commission database categories enzymes by the reactions they catalyse. If your sequences are already categories by EC then you can transfer GO annotations using the mapping file of EC to GO categories.  
The Enzyme Commission database categories enzymes by the reactions they catalyse. If your sequences are already categories by EC then you can transfer GO annotations using the mapping file of EC to GO categories.  
Line 25: Line 25:
These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. Many other mappings are available, and if there is not a mapping file between GO and your current annotation system then we can assist you in making one.  
These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. Many other mappings are available, and if there is not a mapping file between GO and your current annotation system then we can assist you in making one.  


 
==Literature Annotation==





Revision as of 11:34, 22 August 2007

Electronic annotation

This diagram illustrates some of the main ways of making electronic annotation. It should be read from the top down. The diagram shows sequences from UniProt having electronic GO annotation assigned by several computational methods. All of these methods involve use of mapping files. For more information on mappings see http://www.geneontology.org/GO.indices.shtml.

InterPro Mapping

In the case of the Interpro mapping it is possible to assign electronic GO annotation to your sequences based on InterPro domains and a number of other criteria. For example if your sequence has a DNA binding domain then it makes sense to electronically annotate it to the DNA binding function term. For more information on InterPro mapping please see http://gocwiki.geneontology.org/index.php/InterProScan.

Keyword Mapping

This part of the diagram illustrates how sequences already categorised using the SwissProt keyword mapping can have GO annotation automatically applied by transferring via the keyword mapping file.

HAMAP

HAMAP is a system that categories sequences based on family or subfamily characteristics and is applied to bacterial, archaeal and plastid-encoded proteins. GO annotation can be automatically applied to such sequences using the mapping file between HAMAP and GO.

Enzyme Commission

The Enzyme Commission database categories enzymes by the reactions they catalyse. If your sequences are already categories by EC then you can transfer GO annotations using the mapping file of EC to GO categories.

Other mappings

These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. Many other mappings are available, and if there is not a mapping file between GO and your current annotation system then we can assist you in making one.

Literature Annotation


Sending annotations to the consortium

If you are sending annotations to the consortium then please bear these general rules in mind.


Updating the annotations

The gene ontology structure changes over time and so it is essential that annotations should be maintained long term to accommodate these changes. If you are submitting annotations to the Consortium then you should either ensure that your group has funding to maintain the annotations, or that you have made an agreement with another group that they will carry out maintenance.

General principles for sequence ids

  • You must have stable identifiers for your objects.
  • You must provide information on what the object is. For example, is it a protein or nucleotide. It doesn't matter if a nucleotide sequence is a gene, a genome, or an EST as long as you know whether it is nucleotide sequence or a protein.
  • If a sequence identifier has become obsolete then you should be able to track down what has replaced it. What is the mechanism for that?
  • Your database must have an internal rule that object identifiers are never reused.