TermEnrichment: Gold Standard Data Sets

From GO Wiki
Revision as of 19:12, 8 November 2012 by Rama (talk | contribs)
Jump to navigation Jump to search

Goal

Goal is to collect a range of GO annotated gene IDs to evaluate GO. These will serve as control sets so enrichment can be run routinely, say monthly basis, to see how enrichment changes as a consequence of annotation and ontology changes. This will allow users to know what to expect when they do GO enrichment analysis. For example we should be able to see how enrichment is affected when say 10% of the annotations are deleted or if major changes happen in the ontology.

  • We need to define what to expect for any given set of genes. What is the truth?
  • SGD, mouse, fly, zfin, worm will put together some gene sets for this exercise.

What goes in the gold standard gene sets

  • We need separate sets for the 3 different ontologies, although most people enrich only on BP.
  • Provide details on:
    • what are the top 5 hits/enriched terms you expect from your set
    • what is the background set you checked it against
    • Taxon ID
    • Size of the gene set
    • email address of submitter
    • Year submitted:
    • Description
  • there can be multiple sets/ontology and can be different sizes too (100 genes, 500 genes and so on)
  • there can be a set of genes all related to metabolism and another set where these genes are mixed with genes annotated to different processes