Reference Genome Meeting Minutes April 2008

From GO Wiki
Jump to navigation Jump to search

April 20, 2008

Annotation Progress (Mike Cherry)

  • Number of annotated genes per organism by evidence type (overall)
    • Compare graphs for Sept 2007 and Apr 2008 - overall size and size the same, but IEA decreasing

Discussion: What is effort/person? X-axis is absolute number of genes, which doesn't reflect differences in genome size.

  • Number of annotated genes per organism by evidence code for Reference Genome project
    • majority of genes have experimental evidence codes
  • Discussion:
    • Graph needs outline that indicates "no ortholog". This allows a comparison of the genes present or absent in the reference genome genomes. It will also show which organisms are lagging behind.
    • Number of annotations as a metric?

Annotation Progress (Chris Mungall)

Review Annotation Pipeline proposal (Suzi Lewis)

Step 1: Generation of protein sets (excluding functional RNAs)

    • How to define a coherent set
      • For experimental annotations, want to annotate to isoforms. But for tree building want longest protein produced from a gene. So for ortho sets want a unique protein/gene ID for the "canonical" gene/protein.
      • Heterogeneity in column 2 (gene association file). One suggestion is to add another column. Multiple isoform IDS on one line.



    • Gene Association files (see Annotation of alternate spliceforms)

GO annotations refer to gene products. How does UniProt deal with alternate splice forms? most of the time, use canonical identifier followed by -1, -2, etc.

  • Step 2: Experimental Annotation
  • Step 3: Inferential Annotation
  • Step 4: Quality Checks