Metrics: breath and depth of annotations

Rex, Judy, Ruth

For recent summary of some of the issues please open "HowToCaptureMetrics4" file. File:HowToCaptureMetrics4.doc

For Breadth assessment need

  • Number of genes (protein coding and functional RNAs)
  • Number of genes with some functional annotation
  • Number of genes with functional annotation based on experiments using that organism
  • Number of genes with function inferred by sequence similarity

For depth

  • Number of papers linked to a gene
  • Number of papers used to produce functional annotation
  • Number of papers read but for which no new annotations were produced.
  • Ratio of deepest annotation to leaf node to measure granularity and use of the ontology (Suzi)


  • Number of genes that have been “completely” annotated in all reference genomes.
  • List of human disease genes
  • List of orthologs?


  • Are there other metrics that we need?

I think it would be good for each group to identify how many annotations (manual) they have before the gene was added to the list and then how many manual annotations they have when they add the "complete/comprehensive" date. Could this also be done for electronic, TAS/NAS and ISS? We can then say after each year of the project an additional xx annotations were made to xx genes, hopefully show a change in the proportion of manual to electronic. (Ruth)

  • What does it take for each reference genome to produce this set?
  • Should we have a standard file that each reference genome submits to GO for display and monitoring of progress of the reference genome effort?
  • Should this be displayed on a reference genome web page?
  • Is there software development that would assist in the reference genome project.

