Guidelines for electronic annotation methods: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
 
(22 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Annotation Guidelines]]
==Electronic GO annotation==
==Electronic GO annotation==


Electronic annotation is the process of assigning GO terms to gene products using automated methods. This process can rapidly produce millions of annotations in a very short period of time. The GO terms used for electronic annotation tend to be higher-level and, as such, less detailed than the terms used for manual annotation.
Electronic annotation is the process of assigning GO terms to gene products using automated methods. This process can rapidly produce millions of annotations in a very short period of time. The GO terms used for electronic annotation tend to be higher-level and, as such, less detailed than the terms used for manual annotation.
It is possible to use certain electronic annotation methods if you have a new genome sequence to annotate, or a microarray with many thousands of sequences.
It is possible to use certain electronic annotation methods if you have a new genome sequence to annotate, or a microarray with many thousands of sequences. See [http://wiki.geneontology.org/index.php/Guidelines_for_electronic_annotation_methods#Using_electronic_annotation_methods_to_map_GO_annotation_to_datasets Using electronic annotation methods to map GO annotation to datasets]


==Types of electronic annotation==
Annotations made using electronic annotation methods have the Inferred from Electronic Annotation [http://www.geneontology.org/GO.evidence.shtml#iea IEA] evidence code.


===Mapping from controlled vocabularies===
==Types of electronic annotation used by the GO Consortium==
One of the primary methods of generating electronic annotations is to manually map GO terms to corresponding concepts in the controlled vocabularies used by the UniProt Knowledgebase.  
 
The controlled vocabularies that are currently mapped are:
===Mappings to controlled vocabularies===
One of the primary methods of generating electronic annotations is to manually map GO terms to corresponding concepts in the controlled vocabularies used by the UniProt Knowledgebase or other external resources.  
The mappings that are currently used by the GO Consortium to produce annotations are:
*[[Swiss-Prot_keywords_SPKW2GO|Swiss-Prot keywords (SPKW2GO)]]
*[[Swiss-Prot_keywords_SPKW2GO|Swiss-Prot keywords (SPKW2GO)]]
*Subcellular locations (SPSL2GO)
*[[Subcellular_locations_SPSL2GO|Subcellular locations (SPSL2GO)]]
*Enzyme Commission numbers (EC2GO)
*[[Enzyme_Commission_numbers_EC2GO|Enzyme Commission numbers (EC2GO)]]
*InterPro domains (InterPro2GO)
*[[InterPro_domains_InterPro2GO|InterPro domains (InterPro2GO)]]
*HAMAP families (HAMAP2GO)
*[[HAMAP_HAMAP2GO|HAMAP (HAMAP2GO)]]


===Projection of annotations between orthologous gene products===
===Projection of annotations between orthologous gene products===
This method uses orthology data from Ensembl Compara to project GO annotations from a source species onto one or more target species.  
This method uses orthology data from Ensembl Compara to project GO annotations from a source species onto one or more target species.  


Taken from http://www.geneontology.org/GO.annotation.SOP.shtml
*[[Ensembl_Compara|Ensembl Compara orthology electronic annotation method]]
 
==Using electronic annotation methods to map GO annotation to datasets==
 
Mapping files consist of entities from external databases indexed to identical, similar or related GO terms. Various mapping files have been created which can be used to provide novel sequences, ESTs, genomes sequences, microarray datasets or peptide sequences with GO annotation.
 
[http://www.geneontology.org/GO.indices.shtml List of GO mappings to external vocabularies].
 
One of the most common mappings used to apply electronic GO annotation to user sequences is InterPro2GO.
 
===InterPro===
 
It is possible to assign electronic GO annotation to your sequences based on InterPro domains. For example if your sequence has a DNA binding domain then it would be appropriate to electronically annotate it to the [http://amigo.geneontology.org/cgi-bin/amigo/term-details.cgi?term=GO:0003677 DNA binding] function term. For more information on InterPro mapping please see the [http://www.geneontology.org/GO.annotation.interproscan.shtml InterProScan information]
 
[http://www.ebi.ac.uk/Tools/InterProScan/ InterProScan tool]
 
==Using sequence similarity to provide GO annotation to datasets==


Electronic annotation is very quick and produces large amounts of less detailed annotation very quickly. Electronic annotations are rarely wrong, but tend to be less detailed. For example, electronic annotation is likely to tell you which of your genes are transcription factors but unlikely to tell you in great detail what process the gene controls. You may like to use this method if you have a new genome sequence to annotate, or a microarray with many thousands of sequences.
You can also provide your sequences with annotations by BLASTing them against manually annotated sequences and transferring the GO annotations across to your sequence. Since the originating GO annotation is experimentally derived, the annotation transferred to your sequences could have the Inferred from Sequence or Structural Similarity [http://www.geneontology.org/GO.evidence.shtml#iss ISS] evidence code.


Diagram giving overview of electronic annotation
Two tools which can assist you with this are;


[File:diag-iea-overview.png]
[http://amigo.geneontology.org/cgi-bin/amigo/blast.cgi AmiGO BLAST]


This diagram illustrates some of the main ways of making electronic annotation. It should be read from the top down. The diagram shows sequences from UniProt having electronic GO annotation assigned by several computational methods. All of these methods involve use of mapping files. For more information on mappings see the information on mappings of GO to other classification systems.
[http://www.blast2go.org/ Blast2GO]
InterPro Mapping


In the case of the Interpro mapping it is possible to assign electronic GO annotation to your sequences based on InterPro domains and a number of other criteria. For example if your sequence has a DNA binding domain then it makes sense to electronically annotate it to the DNA binding function term. For more information on InterPro mapping please see the information on InterProScan.
Both of these tools allow you to choose the BLAST parameters, the values of which will depend on your individual requirements.
UniProt Keyword Mapping


This part of the diagram illustrates how sequences already categorised using the UniProt keyword mapping can have GO annotation automatically applied by transferring via the keyword mapping file.
HAMAP


HAMAP is a system that categorizes sequences based on family or subfamily characteristics and is applied to bacterial, archaeal and plastid-encoded proteins. GO annotation can be automatically applied to such sequences using the mapping file between HAMAP and GO.
== COMMENTS ==
Enzyme Commission


The Enzyme Commission database categories enzymes by the reactions they catalyse. If your sequences are already categories by EC then you can transfer GO annotations using the mapping file of EC to GO categories.
Perhaps the name for this page should change from electronic -> automatic annotation methods? As then you could include the f2p inferences - these won't have the IEA evidence tag (which seems to be implied by the 'electronic' wording of the page title), but are applied automatically from the f2p links in the GO file and it would be nice to have an explanation of the process here, especially as no GO_REF will be associated with the annotation set. I've drafted an explanation of this annotation method:
Other mappings


These are just a few examples of mapping files that can be used to transfer annotations to your sequence objects. Many other mappings are available, and if there is not a mapping file between GO and your current annotation system, we can assist you in making one.
GOC Inferred Annotations
BLAST


You can also make electronic annotations by BLASTing your sequence against manually annotated sequences and transferring the GO annotations across to your sequence. The threshold of similarity in this process is up to you, and depends on your requirements.
Annotations automatically generated using the Molecular Function->Biological Process inter-ontology relationships present in the GO OBO  v1.2 format. As many GO users do not currently reason over these relationships, a set of inferred annotations are being generated. Such GO annotations are produced when an annotation has been made (either manually or electronically) to a Molecular Function term that, either directly or via one of its parent terms, has an relationship to a Biological Process term, and where this Process term (or one of its children) has not already been used in the annotation set for the same gene product identifier. This inferred annotation set applies the same gene product identifier, reference and evidence code as the asserted function annotation and are generated from all sources of GO annotations, with only 'NOT'-qualified annotations being excluded.  
No similar sequences manually annotated?


If your sequence is similar to other sequences that have been well characterised but not yet annotated from the literature, then one option is to carry out the literature annotation yourself and then transfer by electronic methods.
As an example, you can see an inter-ontology relationship exists to indicate 'GTPase activity' is a part_of the 'GTP catabolic process', as displayed here: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0003924#term=ancchart .  
[[User:Edimmer|Edimmer]]

Latest revision as of 12:04, 13 April 2019

Electronic GO annotation

Electronic annotation is the process of assigning GO terms to gene products using automated methods. This process can rapidly produce millions of annotations in a very short period of time. The GO terms used for electronic annotation tend to be higher-level and, as such, less detailed than the terms used for manual annotation. It is possible to use certain electronic annotation methods if you have a new genome sequence to annotate, or a microarray with many thousands of sequences. See Using electronic annotation methods to map GO annotation to datasets

Annotations made using electronic annotation methods have the Inferred from Electronic Annotation IEA evidence code.

Types of electronic annotation used by the GO Consortium

Mappings to controlled vocabularies

One of the primary methods of generating electronic annotations is to manually map GO terms to corresponding concepts in the controlled vocabularies used by the UniProt Knowledgebase or other external resources. The mappings that are currently used by the GO Consortium to produce annotations are:

Projection of annotations between orthologous gene products

This method uses orthology data from Ensembl Compara to project GO annotations from a source species onto one or more target species.

Using electronic annotation methods to map GO annotation to datasets

Mapping files consist of entities from external databases indexed to identical, similar or related GO terms. Various mapping files have been created which can be used to provide novel sequences, ESTs, genomes sequences, microarray datasets or peptide sequences with GO annotation.

List of GO mappings to external vocabularies.

One of the most common mappings used to apply electronic GO annotation to user sequences is InterPro2GO.

InterPro

It is possible to assign electronic GO annotation to your sequences based on InterPro domains. For example if your sequence has a DNA binding domain then it would be appropriate to electronically annotate it to the DNA binding function term. For more information on InterPro mapping please see the InterProScan information

InterProScan tool

Using sequence similarity to provide GO annotation to datasets

You can also provide your sequences with annotations by BLASTing them against manually annotated sequences and transferring the GO annotations across to your sequence. Since the originating GO annotation is experimentally derived, the annotation transferred to your sequences could have the Inferred from Sequence or Structural Similarity ISS evidence code.

Two tools which can assist you with this are;

AmiGO BLAST

Blast2GO

Both of these tools allow you to choose the BLAST parameters, the values of which will depend on your individual requirements.


COMMENTS

Perhaps the name for this page should change from electronic -> automatic annotation methods? As then you could include the f2p inferences - these won't have the IEA evidence tag (which seems to be implied by the 'electronic' wording of the page title), but are applied automatically from the f2p links in the GO file and it would be nice to have an explanation of the process here, especially as no GO_REF will be associated with the annotation set. I've drafted an explanation of this annotation method:

GOC Inferred Annotations

Annotations automatically generated using the Molecular Function->Biological Process inter-ontology relationships present in the GO OBO v1.2 format. As many GO users do not currently reason over these relationships, a set of inferred annotations are being generated. Such GO annotations are produced when an annotation has been made (either manually or electronically) to a Molecular Function term that, either directly or via one of its parent terms, has an relationship to a Biological Process term, and where this Process term (or one of its children) has not already been used in the annotation set for the same gene product identifier. This inferred annotation set applies the same gene product identifier, reference and evidence code as the asserted function annotation and are generated from all sources of GO annotations, with only 'NOT'-qualified annotations being excluded.

As an example, you can see an inter-ontology relationship exists to indicate 'GTPase activity' is a part_of the 'GTP catabolic process', as displayed here: http://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0003924#term=ancchart . Edimmer