Reference Genome Meeting Minutes April 2008: Difference between revisions
Jump to navigation
Jump to search
Line 28: | Line 28: | ||
***Gene Association files (see Annotation of alternate spliceforms) What is current practice? | |||
****How does UniProt deal with alternate splice forms? Most of the time, there is a 1:1 correspondence between the canonical protein ID and the gene. Uniprot uses canonical identifier followed by -1, -2, etc to indicate isoforms. But sometimes isoforms are so different that they are given separate accessions. In that case, what connects them? have to link out to genomic database. | |||
****WormBase uses a mixture of gene and protein IDs. (Which is used depends upon how the experiments were done.) | |||
****MGI | |||
====Step 2: Experimental Annotation==== | |||
====Step 3: Inferential Annotation==== | |||
====Step 4: Quality Checks==== | |||
Revision as of 12:36, 20 April 2008
April 20, 2008
Annotation Progress (Mike Cherry)
- Number of annotated genes per organism by evidence type (overall)
- Compare graphs for Sept 2007 and Apr 2008 - overall size and size the same, but IEA decreasing
Discussion: What is effort/person? X-axis is absolute number of genes, which doesn't reflect differences in genome size.
- Number of annotated genes per organism by evidence code for Reference Genome project
- majority of genes have experimental evidence codes
- Discussion:
- Graph needs outline that indicates "no ortholog". This allows a comparison of the genes present or absent in the reference genome genomes. It will also show which organisms are lagging behind.
- Number of annotations as a metric?
Annotation Progress (Chris Mungall)
Review Annotation Pipeline proposal (Suzi Lewis)
Step 1: Generation of protein sets (excluding functional RNAs)
- How to define a coherent set
- For experimental annotations, want to annotate to isoforms. But for tree building want longest protein produced from a gene. So for ortho sets want a unique protein/gene ID for the "canonical" gene/protein.
- How to define a coherent set
- Heterogeneity in column 2 (gene association file). One suggestion is to add another column. Multiple isoform IDS on one line.
- Gene Association files (see Annotation of alternate spliceforms) What is current practice?
- How does UniProt deal with alternate splice forms? Most of the time, there is a 1:1 correspondence between the canonical protein ID and the gene. Uniprot uses canonical identifier followed by -1, -2, etc to indicate isoforms. But sometimes isoforms are so different that they are given separate accessions. In that case, what connects them? have to link out to genomic database.
- WormBase uses a mixture of gene and protein IDs. (Which is used depends upon how the experiments were done.)
- MGI
- Gene Association files (see Annotation of alternate spliceforms) What is current practice?