AmiGO Manual: Term Enrichment: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 1: Line 1:
=Overview=
The Term Enrichment tool can be used to discover what a set of genes may have in common by examining annotations and finding significant shared GO terms. The algorithm employed by the tool attempts to determine whether an observed level of annotation for a group of genes is significant within the context of annotation for all genes within the genome; examples of studies that have used this algorithm are [http://www.ncbi.nlm.nih.gov/pubmed/15492223 PMID:15492223] and [http://www.ncbi.nlm.nih.gov/pubmed/14561723 PMID:14561723]. AmiGO's Term Enrichment tool, which is based on the [http://search.cpan.org/dist/GO-TermFinder/ GO-TermFinder perl] module by Gavin Sherlock and Shuai Weng at Stanford University, allows users to specify a list of genes, define a background set against which the significance will be calculated and set the p-value (significance indicator) cut-off.
The Term Enrichment tool can be used to discover what a set of genes may have in common by examining annotations and finding significant shared GO terms. The algorithm employed by the tool attempts to determine whether an observed level of annotation for a group of genes is significant within the context of annotation for all genes within the genome; examples of studies that have used this algorithm are [http://www.ncbi.nlm.nih.gov/pubmed/15492223 PMID:15492223] and [http://www.ncbi.nlm.nih.gov/pubmed/14561723 PMID:14561723]. AmiGO's Term Enrichment tool, which is based on the [http://search.cpan.org/dist/GO-TermFinder/ GO-TermFinder perl] module by Gavin Sherlock and Shuai Weng at Stanford University, allows users to specify a list of genes, define a background set against which the significance will be calculated and set the p-value (significance indicator) cut-off.


Term enrichment is a very useful method for analyzing data from large scale experiments, such as gene clusters from microarray expression data. For a more detailed discussion of the algorithm, please see the [http://www.ncbi.nlm.nih.gov/pubmed/15297299 published material] on GO::TermFinder.
Term enrichment is a very useful method for analyzing data from large scale experiments, such as gene clusters from microarray expression data. For a more detailed discussion of the algorithm, please see the [http://www.ncbi.nlm.nih.gov/pubmed/15297299 published material] on GO::TermFinder.


=Gene Product List=
=Usage=
 
==Gene Product List==


The user may upload a whitespace separated list of gene product identifiers. These may be a mix of gene product symbols, synonyms or accessions.
The user may upload a whitespace separated list of gene product identifiers. These may be a mix of gene product symbols, synonyms or accessions.
Line 11: Line 15:
If AmiGO finds any gene product identifiers that are ambiguous or not found, the user will be informed before the end of the process.
If AmiGO finds any gene product identifiers that are ambiguous or not found, the user will be informed before the end of the process.


=Background Set=
==Background Set==


The background set may be input in a very similar way to the gene product list above. The only difference is the addition of an optional database filter--the user must either enter/upload a background set, select a database filter, or do both.
The background set may be input in a very similar way to the gene product list above. The only difference is the addition of an optional database filter--the user must either enter/upload a background set, select a database filter, or do both.


==Filtering==
===Filtering===


If the user enters a background set and selects a database, the inputted background set will be filtered so that only gene products that are found in that database can be used. This can help to remove a lot of possible ambiguity in the inputted set.
If the user enters a background set and selects a database, the inputted background set will be filtered so that only gene products that are found in that database can be used. This can help to remove a lot of possible ambiguity in the inputted set.
Line 21: Line 25:
Otherwise, if the user did not enter a background set, the selected database will be used as the background set.
Otherwise, if the user did not enter a background set, the selected database will be used as the background set.


==Thresholds==
===Thresholds===


The AmiGO interface gives the user the ability to change the ''maximum p-value'' and the ''minimum number of gene products'' that are used when running the algorithm. Please see the [http://www.ncbi.nlm.nih.gov/pubmed/15297299 published material] for a more detailed discussion of what these values mean and how to use them meaningfully.
The AmiGO interface gives the user the ability to change the ''maximum p-value'' and the ''minimum number of gene products'' that are used when running the algorithm. Please see the [http://www.ncbi.nlm.nih.gov/pubmed/15297299 published material] for a more detailed discussion of what these values mean and how to use them meaningfully.


=Advanced Options=
==Advanced Options==


Clicking on '''Display advanced result options''' gives advanced users access to additional settings.
Clicking on '''Display advanced result options''' gives advanced users access to additional settings.


==Result Types==
===Result Types===


In addition to the '''standard results''' that are returned from this page, the user may also select '''all results''', which will return all results without any kind of threshold filtering (and ignoring any threshold inputs specified above).
In addition to the '''standard results''' that are returned from this page, the user may also select '''all results''', which will return all results without any kind of threshold filtering (and ignoring any threshold inputs specified above).


==Results Formats==
===Results Formats===


In addition to the standard '''html page''' results, the user may instead select a '''tab-delimited file''' or an '''xml file'''. Please be warned that the XML file is in an unstable internal format and should only really be used by people prefer parsing XML over other types.
In addition to the standard '''html page''' results, the user may instead select a '''tab-delimited file''' or an '''xml file'''. Please be warned that the XML file is in an unstable internal format and should only really be used by people prefer parsing XML over other types.

Revision as of 13:21, 10 March 2009

Overview

The Term Enrichment tool can be used to discover what a set of genes may have in common by examining annotations and finding significant shared GO terms. The algorithm employed by the tool attempts to determine whether an observed level of annotation for a group of genes is significant within the context of annotation for all genes within the genome; examples of studies that have used this algorithm are PMID:15492223 and PMID:14561723. AmiGO's Term Enrichment tool, which is based on the GO-TermFinder perl module by Gavin Sherlock and Shuai Weng at Stanford University, allows users to specify a list of genes, define a background set against which the significance will be calculated and set the p-value (significance indicator) cut-off.

Term enrichment is a very useful method for analyzing data from large scale experiments, such as gene clusters from microarray expression data. For a more detailed discussion of the algorithm, please see the published material on GO::TermFinder.

Usage

Gene Product List

The user may upload a whitespace separated list of gene product identifiers. These may be a mix of gene product symbols, synonyms or accessions.

If the list is too large for manual input, the user may instead upload a either a file containing identifiers (as listed above) or a gene association file.

If AmiGO finds any gene product identifiers that are ambiguous or not found, the user will be informed before the end of the process.

Background Set

The background set may be input in a very similar way to the gene product list above. The only difference is the addition of an optional database filter--the user must either enter/upload a background set, select a database filter, or do both.

Filtering

If the user enters a background set and selects a database, the inputted background set will be filtered so that only gene products that are found in that database can be used. This can help to remove a lot of possible ambiguity in the inputted set.

Otherwise, if the user did not enter a background set, the selected database will be used as the background set.

Thresholds

The AmiGO interface gives the user the ability to change the maximum p-value and the minimum number of gene products that are used when running the algorithm. Please see the published material for a more detailed discussion of what these values mean and how to use them meaningfully.

Advanced Options

Clicking on Display advanced result options gives advanced users access to additional settings.

Result Types

In addition to the standard results that are returned from this page, the user may also select all results, which will return all results without any kind of threshold filtering (and ignoring any threshold inputs specified above).

Results Formats

In addition to the standard html page results, the user may instead select a tab-delimited file or an xml file. Please be warned that the XML file is in an unstable internal format and should only really be used by people prefer parsing XML over other types.