Mock-ups for GO website
- 1 Tell us about your requirements
- 2 Electronic annotation
- 3 Literature Annotation
- 4 Sending annotations to the consortium
Tell us about your requirements
I represent a small lab working on biological area
In this case, perhaps you have a list of your favourite genes and you wish to annotate them. You have a range of choices depending on what you are trying to achieve.
Please see the range of options below and choose the one that suits you best.
I have a set of ESTs and I would like to attach annotations.
If you would like ultimately to send the annotations to the consortium for distribution then it is crucial that your EST clusters should maintain the same identifiers over each round of reclustering. One way to do this is to identify clusters based on one EST that is chosen for each cluster. There may be other good ways that we have not heard of.
Once you have your clusters and stable identifiers follow the IEA directions for making electronic annotation.
You could also run BlastX, or run gene prediction programmes and then BlastP. Running InterPro on the sequences will find the longest open reading frame.
I have a genome sequence
You will already have assembled the genome sequence and made gene calls. Once you have the cds sequences or predicted protein sequences then you can follow the instructions on IEA annotation.
I have a microarray data set
The action you can take depends somewhat on your sequences.
- Are they cDNAs or oligos?
- Do they have identifiers? Which kind?
- How do they relate to the genes? If you know which sequence relates to which characterised gene then it will be easy to transfer annotations over.
- Do the genes have GO annotations? If they do not have full GO annotation from literature then you may like to apply for funding to annotate the genes yourself, or write to your Model Organism Database to ask them to do so.
- Can you get more up to date annotations than those provided with your tool? It may be that you are seeing only the annotations that come from your proprietaty microarray software provider. It is a good idea to ask how often they update their annotations and ontology structure as these change from day to day, and there may be many more annotations available than you are seeing.
It is most likely that you will want to use mainly electronic annotations, supplemented with some literature annotation for those sequences that are not yet fully annotated.
I have a peptide sequence.
- Do you know what gene is it?
- Can you map it to known genes with identifiers?
If so then you can retrieve annotations or make your own by any of the illustrate methods.
Electronic annotation is very quick and produces large amounts of less detailed annotation very quickly. Electronic annotations are rarely wrong, but tend to be less detailed. For example, electronic annotation is likely to tell you which of your genes are transcription factors but unlikely to tell you in great detail what process the gene controls. You may like to use this method if you have a new genome sequence to annotate, or a microarray with many thousands of sequences.
This diagram illustrates some of the main ways of making electronic annotation. It should be read from the top down. The diagram shows sequences from UniProt having electronic GO annotation assigned by several computational methods. All of these methods involve use of mapping files. For more information on mappings see http://www.geneontology.org/GO.indices.shtml.
In the case of the Interpro mapping it is possible to assign electronic GO annotation to your sequences based on InterPro domains and a number of other criteria. For example if your sequence has a DNA binding domain then it makes sense to electronically annotate it to the DNA binding function term. For more information on InterPro mapping please see http://gocwiki.geneontology.org/index.php/InterProScan.
This part of the diagram illustrates how sequences already categorised using the SwissProt keyword mapping can have GO annotation automatically applied by transferring via the keyword mapping file.
HAMAP is a system that categorizes sequences based on family or subfamily characteristics and is applied to bacterial, archaeal and plastid-encoded proteins. GO annotation can be automatically applied to such sequences using the mapping file between HAMAP and GO.
The Enzyme Commission database categories enzymes by the reactions they catalyse. If your sequences are already categories by EC then you can transfer GO annotations using the mapping file of EC to GO categories.
These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. Many other mappings are available (http://www.geneontology.org/GO.indices.shtml), and if there is not a mapping file between GO and your current annotation system then we can assist you in making one.
You can also make electronic annotations by blasting your sequence against manually annotated sequences and transferring the GO annotations across to your sequence. The threshold of similarity in this process is up to you, and depends on your requirements.
No similar seqences manually annotated?
If your sequence is similar to other sequences that have been well characterised but not yet annotated from the literature, then one option is to carry out the literature annotation yourself and then transfer by electronic methods.
Literature annotation involves capturing published information about the exact function of a gene product as a GO annotations. To do this you must read the publications about the gene and write down all the information. This annotation is time-consuming but produces very high quality, species-specific annotation, and brings the information about the gene product into a format in which it can be used in high-throughput experiments. This is an extremely worthwhile process in the long term. It may be best carried out by people who know the function of the gene product, and the associated biology, in great detail; for example experimental scientists who are familiar with the published literature. If you are doing this, then you may like to write and suggest modifications to the ontology structure as well.
Below is a schematic diagram giving an introduction to the steps involved in literature-based GO annotation. If you are interested in carrying out literature-based annotation you can receive full training in the process by attending a GO annotation camp or by working with an individual GO Consortium annotation mentor.
Sending annotations to the consortium
If you are sending annotations to the consortium then please bear these general rules in mind.
Updating the annotations
The gene ontology structure changes over time and so it is essential that annotations should be maintained long term to accommodate these changes. If you are submitting annotations to the Consortium then you should either ensure that your group has funding to maintain the annotations, or that you have made an agreement with another group that they will carry out maintenance.
General principles for sequence ids
- You must have stable identifiers for your objects.
- You must provide information on what the object is. For example, is it a protein or nucleotide. It doesn't matter if a nucleotide sequence is a gene, a genome, or an EST as long as you know whether it is nucleotide sequence or a protein.
- If a sequence identifier has become obsolete then you should be able to track down what has replaced it. What is the mechanism for that?
- Your database must have an internal rule that object identifiers are never reused.