AmiGO Manual: Slimmer: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Overview=
The Slimmer tool allows you to map the granular annotations of the query set of genes to one or more high-level, broader parent terms referred to as GO Slim terms. This is possible with GO because there are parent:child relationships recorded between granular terms and more general parent (ie. GO slim) terms. The Slimmer tool can be useful in reporting the results of GO annotations of a genome, analyzing the results of microarray expression data, or cDNA collection using a high level view of the three ontologies.
The Slimmer tool allows you to map the granular annotations of the query set of genes to one or more high-level, broader parent terms referred to as GO Slim terms. This is possible with GO because there are parent:child relationships recorded between granular terms and more general parent (ie. GO slim) terms. The Slimmer tool can be useful in reporting the results of GO annotations of a genome, analyzing the results of microarray expression data, or cDNA collection using a high level view of the three ontologies.


The AmiGO version is based on the perl script [http://search.cpan.org/~cmungall/go-perl/scripts/map2slim map2slim], where documentation about the inner workings and details can be found. More information about GO subsets (AKA slims) in general, please see the [http://www.geneontology.org/GO.slims.shtml documentation].
'''Caution:''' Please note that by default, this tool uses annotation datasets that include [http://www.geneontology.org/GO.evidence.shtml#iea electronically inferred] (IEA) data. The results for organisms where a proportion of the annotation coverage is IEA-based will not match/correspond only to the annotations made by curators. For more information about what data AmiGO uses, please see the [[AmiGO_Manual:_Overview#What_data_does_AmiGO_use.3F | overview]] page.
=Usage=
==Gene Product List==
The user may upload a whitespace separated list of gene product identifiers. These may be a mix of gene product symbols, synonyms or accessions. If the list is too large for manual input, the user may instead upload a either a file containing identifiers (as listed above) or a [http://www.geneontology.org/GO.format.annotation.shtml gene association file]. Also, if AmiGO finds any gene product identifiers that are ambiguous or not found, the user will be informed before the end of the process.
===Filtering===
If the user selects a database filter, the inputted gene product list (or gene association file) will be filtered so that only gene products that are found in that database will be used in the calculations. This can help to remove a lot of possible ambiguity in the inputted set.
The user may also select to filter by [http://www.geneontology.org/GO.evidence.shtml evidence code], which will remove gene products that are not associated with a term without the selected evidence.
==Slim Terms==
This gives the user a chance to define the subset (AKA slim) that they are interested in. The user must select one (and only one) of the following three methods.
First, the user may manually enter their subset terms. GO IDs should be separated by whitespace and entered in the form: "GO:nnnnnnn", where 'n' is an integer.
Second, the user may select one of the pre-defined sets. If you have questions about the contents of a set, view the [http://www.geneontology.org/GO.slims.shtml subset documentation ] or query [http://amigo.geneontology.org/goose GOOSE] to get a detailed list.
Finally, you may either upload a text file containing GO IDs (as described in the first option) or upload a file in the [http://www.geneontology.org/GO.format.shtml#oboformat OBO format]. Please remember that if you are uploading an OBO file, you must have the ".obo" extension for the file to be identified correctly.
==Results==
The default results diplayed have the following three columns:
* '''GO Slim Term''': An inputted slim term, or term from the inputted set.
* '''Total # GPs''': The total number of discovered gene products in the query that mapped to this term.
* '''Ontology''': The letters represent the location of this term within the three broadest parts of the Gene Ontology: biological process (P), cellular component (C), and molecular function (F).
Other formats are available in the Advanced Options section below.
==Advanced Options==
Clicking on '''Display advanced result options''' gives advanced users access to additional settings.
===Result Types===
The '''gene product counts''' option is the default result type and what most people find useful.
The '''gene association file''' option does pretty much what you'd expect and generates a [http://www.geneontology.org/GO.format.annotation.shtml gene association file] view of the results.
The '''mapping file''' option generates a mapping file of the results as described in the [http://search.cpan.org/~cmungall/go-perl/scripts/map2slim map2slim documentation].
The '''mapping file fo every term''' option is the same as above except that it covers the whole ontology and not just the inputted subset. Be aware that this option generates a large file and is very resource intensive--please use with caution.
===Result Formats===
In addition to the standard '''html page''' results, the user may instead select a '''tab-delimited file''' or an '''xml file'''. Please be warned that the XML file is in an unstable internal format and should only really be used by people prefer parsing XML over other types.
===Bucket Terms===
This feature is currently in development and is available only on the experimental site. For more information about bucket terms, please see the original map2slim [http://search.cpan.org/~cmungall/go-perl/scripts/map2slim#BUCKET_TERMS documentation]. Please stay tuned for more details.
= Limitations =
{{Software:Database_Limitations}}
[[Category:AmiGO_Manual]]
[[Category:AmiGO]]
[[Category:AmiGO]]
[[Category:AmiGO_Manual]]

Latest revision as of 15:18, 28 January 2015

Overview

The Slimmer tool allows you to map the granular annotations of the query set of genes to one or more high-level, broader parent terms referred to as GO Slim terms. This is possible with GO because there are parent:child relationships recorded between granular terms and more general parent (ie. GO slim) terms. The Slimmer tool can be useful in reporting the results of GO annotations of a genome, analyzing the results of microarray expression data, or cDNA collection using a high level view of the three ontologies.

The AmiGO version is based on the perl script map2slim, where documentation about the inner workings and details can be found. More information about GO subsets (AKA slims) in general, please see the documentation.

Caution: Please note that by default, this tool uses annotation datasets that include electronically inferred (IEA) data. The results for organisms where a proportion of the annotation coverage is IEA-based will not match/correspond only to the annotations made by curators. For more information about what data AmiGO uses, please see the overview page.

Usage

Gene Product List

The user may upload a whitespace separated list of gene product identifiers. These may be a mix of gene product symbols, synonyms or accessions. If the list is too large for manual input, the user may instead upload a either a file containing identifiers (as listed above) or a gene association file. Also, if AmiGO finds any gene product identifiers that are ambiguous or not found, the user will be informed before the end of the process.

Filtering

If the user selects a database filter, the inputted gene product list (or gene association file) will be filtered so that only gene products that are found in that database will be used in the calculations. This can help to remove a lot of possible ambiguity in the inputted set.

The user may also select to filter by evidence code, which will remove gene products that are not associated with a term without the selected evidence.

Slim Terms

This gives the user a chance to define the subset (AKA slim) that they are interested in. The user must select one (and only one) of the following three methods.

First, the user may manually enter their subset terms. GO IDs should be separated by whitespace and entered in the form: "GO:nnnnnnn", where 'n' is an integer.

Second, the user may select one of the pre-defined sets. If you have questions about the contents of a set, view the subset documentation or query GOOSE to get a detailed list.

Finally, you may either upload a text file containing GO IDs (as described in the first option) or upload a file in the OBO format. Please remember that if you are uploading an OBO file, you must have the ".obo" extension for the file to be identified correctly.

Results

The default results diplayed have the following three columns:

  • GO Slim Term: An inputted slim term, or term from the inputted set.
  • Total # GPs: The total number of discovered gene products in the query that mapped to this term.
  • Ontology: The letters represent the location of this term within the three broadest parts of the Gene Ontology: biological process (P), cellular component (C), and molecular function (F).

Other formats are available in the Advanced Options section below.

Advanced Options

Clicking on Display advanced result options gives advanced users access to additional settings.

Result Types

The gene product counts option is the default result type and what most people find useful.

The gene association file option does pretty much what you'd expect and generates a gene association file view of the results.

The mapping file option generates a mapping file of the results as described in the map2slim documentation.

The mapping file fo every term option is the same as above except that it covers the whole ontology and not just the inputted subset. Be aware that this option generates a large file and is very resource intensive--please use with caution.

Result Formats

In addition to the standard html page results, the user may instead select a tab-delimited file or an xml file. Please be warned that the XML file is in an unstable internal format and should only really be used by people prefer parsing XML over other types.

Bucket Terms

This feature is currently in development and is available only on the experimental site. For more information about bucket terms, please see the original map2slim documentation. Please stay tuned for more details.

Limitations

Unfortunately, at this time, the term enrichment tool and slimmer both suffer from timeout and load issues on their They are limited in the amount of work that they can accomplish before a timeout event occurs either on the client or server. Due to these limitations, the current tool is not really designed to work on sets beyond a certain size. Unfortunately, this size is hard to pinpoint: depending on input type, size, and database warmup, the results available may be very different. If you get a timeout error, or see a phrase like "Query execution was interrupted", you have probably reached the time resource limit.

In the fairly near future we'll be moving to a Galaxy based workflow system that will not have these same limitations in size and time. Until then, you may wish to take a look at other available third-party tools:

Or look at a tool like Ontologizer: