Reference Genome Meeting Minutes April 2008: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(New page: ==April 20, 2008== ===Annotation Progress (Mike Cherry)=== *Number of annotated genes per organism by evidence type (overall) **Compare graphs for Sept 2007 and Apr 2008 - overall size an...)
 
Line 15: Line 15:
**Graph needs outline that indicates "no ortholog".  This allows a comparison of the genes present or absent in the reference genome genomes.  It will also show which organisms are lagging behind.  
**Graph needs outline that indicates "no ortholog".  This allows a comparison of the genes present or absent in the reference genome genomes.  It will also show which organisms are lagging behind.  
**Number of annotations as a metric?
**Number of annotations as a metric?
===Annotation Progress (Chris Mungall)===
===Review Annotation Pipeline proposal (Suzi Lewis)===
====Step 1: Generation of protein sets (excluding functional RNAs)====
**How to define a coherent set
***For experimental annotations, want to annotate to isoforms.  But for tree building want longest protein produced from a gene.  So for ortho sets want a unique protein/gene ID for the "canonical" gene/protein.
***Heterogeneity in column 2 (gene association file).  One suggestion is to add another column.  Multiple isoform IDS on one line. 
**Gene Association files (see Annotation of alternate spliceforms)
GO annotations refer to gene products.
How does UniProt deal with alternate splice forms? most of the time, use canonical identifier followed by -1, -2, etc.
*Step 2: Experimental Annotation
*Step 3:  Inferential Annotation
*Step 4:  Quality Checks

Revision as of 12:29, 20 April 2008

April 20, 2008

Annotation Progress (Mike Cherry)

  • Number of annotated genes per organism by evidence type (overall)
    • Compare graphs for Sept 2007 and Apr 2008 - overall size and size the same, but IEA decreasing

Discussion: What is effort/person? X-axis is absolute number of genes, which doesn't reflect differences in genome size.

  • Number of annotated genes per organism by evidence code for Reference Genome project
    • majority of genes have experimental evidence codes
  • Discussion:
    • Graph needs outline that indicates "no ortholog". This allows a comparison of the genes present or absent in the reference genome genomes. It will also show which organisms are lagging behind.
    • Number of annotations as a metric?

Annotation Progress (Chris Mungall)

Review Annotation Pipeline proposal (Suzi Lewis)

Step 1: Generation of protein sets (excluding functional RNAs)

    • How to define a coherent set
      • For experimental annotations, want to annotate to isoforms. But for tree building want longest protein produced from a gene. So for ortho sets want a unique protein/gene ID for the "canonical" gene/protein.
      • Heterogeneity in column 2 (gene association file). One suggestion is to add another column. Multiple isoform IDS on one line.



    • Gene Association files (see Annotation of alternate spliceforms)

GO annotations refer to gene products. How does UniProt deal with alternate splice forms? most of the time, use canonical identifier followed by -1, -2, etc.

  • Step 2: Experimental Annotation
  • Step 3: Inferential Annotation
  • Step 4: Quality Checks