Talk:2010 GO camp Annotation propagation rules
Jump to navigation
Jump to search
Conference call April 28 2010: Ensembl Gene Trees & Annotation propagation
Javier Herrero & Glenn Proctor
Gene Tree building pipeline:
- Load genes and longest translation from all ensembl species
- WU blastP + SmithWaterman longest translation of every gene against every other gene
- Build cluster and MSA with Mcoffee
- Build a reconciled gene press with internal duplication nodes taking species tree into account (TreeBeST)
- Infer orthologs and Paralogs (OrthoTree)
- Gene trees predicting too many gene losses and gains are corrected to make simpler models of evolution, and the predicted ancestor is replaced by an 'ambiguous' ancestor.
- Often when gene trees disagree with the species tree (or predict complicated gene losses/gene gains), this is due to gene splits often resulting from incorrect sequence or assemblies.
- This is fed back to gene builders
Gene Trees: species coverage
ensembl is human-centric; gene trees covers vertebrates and use drosophila, c. elegans and yeast as outgroups. The software does not try to ensure that the trees are correct for the outgroup. Pan-compara is a separate effort aimed at extending the trees to other species.
Annotation Propagation by ensembl
- Gene names:
- From human and mouse to all vertebrates: Use the 1:1 orthologs
- From human to fish: Use the 1:many orthologs; Names become NAME (1 of 3), NAME (2 of 3), etc.
- Rules:
- if source gene has an HGNC name, and target gene has no name or only a RefSeq predicted name, add name to target gene.
- Change status to “KNOWN_BY_PROJECTION”
- GO annotations
- Use the 1:1 orthologs only
- From human and mouse to all vertebrates
- From rat to human and mouse
- From zebrafish to other fish
- From human to zebrafish
- Rules:
- add GO terms from source gene to target gene, avoiding duplicates. Only project source GO terms with evidence codes IDA, IEP, IGI, IMP, IPI.
- Projected GO terms on target gene are given evidence code IEA.
- All terms from all 3 ontologies are used; eventually plan to used the GO's taxon rules.