Difference between revisions of "Mock-ups for GO website"

From GO Wiki
Jump to: navigation, search
m
m
Line 3: Line 3:
  
 
===I represent a small lab working on biological area===
 
===I represent a small lab working on biological area===
 +
 +
In this case, perhaps you have a list of your favourite genes and you wish to annotate them.
 +
You have a range of choices depending on what you are trying to achieve.
 +
 
If you have protein sequence in a text file, maybe from a load of papers you read. <br>
 
If you have protein sequence in a text file, maybe from a load of papers you read. <br>
 
you can blastP and get e-annotation. <br>
 
you can blastP and get e-annotation. <br>
Line 9: Line 13:
  
 
===I have a set of ESTs and I would like to attach annotations.===
 
===I have a set of ESTs and I would like to attach annotations.===
You can cluster and apply ids.<br>
+
If you would like ultimately to send the annotations to the consortium for distribution then it is crucial that yor EST clusters should maintain the same identifiers over each round of reclustering. One way to do this is to identify clusters based on one EST that is chosen for each cluster. There may be other good ways that we have not heard of.
then run blastX or run gene predictions programme then blastP<br>
+
 
you can run interpro and it will try to find the longest orf.<br>
+
Once you have your clusters and stable identifiers follow the IEA directions for making electronic annotation.  
You can also do manual annotations to the predicted proteins. <br>
+
 
 +
You could also run BlastX, or run gene prediction programmes and then BlastP. Running InterPro on the sequences will find the longest open reading frame.  
  
 
===I have a genome sequence===
 
===I have a genome sequence===
you assemble the genome sequence and do gene calls. <br>
+
 
you get a cds seqence or a protein sequence or both. <br>
+
You will already have assembled the genome sequence and made gene calls. Once you have the cds sequences or predicted protein sequences then you can follow the instructions on IEA annotation.  
then you can take the cds and do blastX or interproscan.<br>
 
Or take protein predictions and interproscan or blastP and then manual annotations.<br>
 
  
 
===I have a microarray data set===
 
===I have a microarray data set===
cdna or oligos?<br>
+
 
do they have ids?<br>
+
The action you can take depends somewhat on your sequences.
How do they relate to the genes?<br>
+
 
Do the genes have GO annotations?<br>
+
*Are they cDNAs or oligos?
Is there a mod for your species that does GO?<br>
+
*Do they have identifiers? Which kind?
Have you talked to them?<br>
+
*How do they relate to the genes? If you know which sequence relates to which characterised gene then it will be easy to transfer annotations over.
Can you get more upto date annotations than those provided with your tool?<br>
+
*Do the genes have GO annotations? If they do not have full GO annotation from literature then you may like to apply for funding to annotate the genes yourself, or write to your Model Organism Database to ask them to do so.
 +
*Can you get more up to date annotations than those provided with your tool? It may be that you are seeing only the annotations that come from your proprietaty microarray software provider. It is a good idea to ask how often they update their annotations and ontology structure as these change from day to day, and there may be many more annotations available than you are seeing.
  
 
===A have a peptide sequence. ===
 
===A have a peptide sequence. ===
Line 33: Line 37:
 
Can you map to known genes with ids?<br>
 
Can you map to known genes with ids?<br>
 
can you retrieve the annotations or make annotations. <br>
 
can you retrieve the annotations or make annotations. <br>
 
 
  
  
Line 64: Line 66:
 
These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. Many other mappings are available, and if there is not a mapping file between GO and your current annotation system then we can assist you in making one.  
 
These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. Many other mappings are available, and if there is not a mapping file between GO and your current annotation system then we can assist you in making one.  
  
==Blast==
+
===Blast===
  
 
You can also make electronic annotations by blasting your sequence against manually annotated sequences and transferring the GO annotations across to your sequence. The threshold of similarity in this process is up to you, and depends on your requirements.  
 
You can also make electronic annotations by blasting your sequence against manually annotated sequences and transferring the GO annotations across to your sequence. The threshold of similarity in this process is up to you, and depends on your requirements.  
 +
 +
===No similar seqences manually annotated?===
 +
 +
If your sequence is similar to other sequences that have been well characterised but not yet annotated from the literature, then one option is to carry out the literature annotation yourself and then transfer by electronic methods.
 +
 +
  
 
<br><br>
 
<br><br>

Revision as of 08:11, 22 August 2007

Tell us about your requirements

I represent a small lab working on biological area

In this case, perhaps you have a list of your favourite genes and you wish to annotate them. You have a range of choices depending on what you are trying to achieve.

If you have protein sequence in a text file, maybe from a load of papers you read.
you can blastP and get e-annotation.
you can run through interproscan and get the domains interpro2go annotations.
you can manually annotate them.

I have a set of ESTs and I would like to attach annotations.

If you would like ultimately to send the annotations to the consortium for distribution then it is crucial that yor EST clusters should maintain the same identifiers over each round of reclustering. One way to do this is to identify clusters based on one EST that is chosen for each cluster. There may be other good ways that we have not heard of.

Once you have your clusters and stable identifiers follow the IEA directions for making electronic annotation.

You could also run BlastX, or run gene prediction programmes and then BlastP. Running InterPro on the sequences will find the longest open reading frame.

I have a genome sequence

You will already have assembled the genome sequence and made gene calls. Once you have the cds sequences or predicted protein sequences then you can follow the instructions on IEA annotation.

I have a microarray data set

The action you can take depends somewhat on your sequences.

  • Are they cDNAs or oligos?
  • Do they have identifiers? Which kind?
  • How do they relate to the genes? If you know which sequence relates to which characterised gene then it will be easy to transfer annotations over.
  • Do the genes have GO annotations? If they do not have full GO annotation from literature then you may like to apply for funding to annotate the genes yourself, or write to your Model Organism Database to ask them to do so.
  • Can you get more up to date annotations than those provided with your tool? It may be that you are seeing only the annotations that come from your proprietaty microarray software provider. It is a good idea to ask how often they update their annotations and ontology structure as these change from day to day, and there may be many more annotations available than you are seeing.

A have a peptide sequence.

what gene is it?
Can you map to known genes with ids?
can you retrieve the annotations or make annotations.


Electronic annotation

IEAoverview.jpg

This diagram illustrates some of the main ways of making electronic annotation. It should be read from the top down. The diagram shows sequences from UniProt having electronic GO annotation assigned by several computational methods. All of these methods involve use of mapping files. For more information on mappings see http://www.geneontology.org/GO.indices.shtml.

InterPro Mapping

In the case of the Interpro mapping it is possible to assign electronic GO annotation to your sequences based on InterPro domains and a number of other criteria. For example if your sequence has a DNA binding domain then it makes sense to electronically annotate it to the DNA binding function term. For more information on InterPro mapping please see http://gocwiki.geneontology.org/index.php/InterProScan.

Keyword Mapping

This part of the diagram illustrates how sequences already categorised using the SwissProt keyword mapping can have GO annotation automatically applied by transferring via the keyword mapping file.

HAMAP

HAMAP is a system that categories sequences based on family or subfamily characteristics and is applied to bacterial, archaeal and plastid-encoded proteins. GO annotation can be automatically applied to such sequences using the mapping file between HAMAP and GO.

Enzyme Commission

The Enzyme Commission database categories enzymes by the reactions they catalyse. If your sequences are already categories by EC then you can transfer GO annotations using the mapping file of EC to GO categories.

Other mappings

These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. Many other mappings are available, and if there is not a mapping file between GO and your current annotation system then we can assist you in making one.

Blast

You can also make electronic annotations by blasting your sequence against manually annotated sequences and transferring the GO annotations across to your sequence. The threshold of similarity in this process is up to you, and depends on your requirements.

No similar seqences manually annotated?

If your sequence is similar to other sequences that have been well characterised but not yet annotated from the literature, then one option is to carry out the literature annotation yourself and then transfer by electronic methods.




Literature Annotation


Below is a schematic diagram giving an introduction to the steps involved in literature-based GO annotation. If you are interested in carrying out literature-based annotation you can receive full tuition in the process by attending a GO annotation camp or by working with an individual GO Consortium annotation mentor.

Literature original-4thMay.png


Sending annotations to the consortium

If you are sending annotations to the consortium then please bear these general rules in mind.


Updating the annotations

The gene ontology structure changes over time and so it is essential that annotations should be maintained long term to accommodate these changes. If you are submitting annotations to the Consortium then you should either ensure that your group has funding to maintain the annotations, or that you have made an agreement with another group that they will carry out maintenance.

General principles for sequence ids

  • You must have stable identifiers for your objects.
  • You must provide information on what the object is. For example, is it a protein or nucleotide. It doesn't matter if a nucleotide sequence is a gene, a genome, or an EST as long as you know whether it is nucleotide sequence or a protein.
  • If a sequence identifier has become obsolete then you should be able to track down what has replaced it. What is the mechanism for that?
  • Your database must have an internal rule that object identifiers are never reused.