List of potentially problematic families for all vs. all BLAST methods of orthology determination (Retired)

From GO Wiki
Revision as of 10:00, 26 March 2008 by Karadolinski (talk | contribs)
Jump to navigation Jump to search

Table of erroneous orthologous relationships predicted by sequence-based methods

Columns:

query gene query organism false positive (fp) / false negative (fn) problem organism:problem gene(s)| method note

List of potentially problematic families

Major Facilitator Superfamily (from Val's email)

SLC22A14 and SLC22A11

From Val: I just spotted 2 genes in the new list which have 12 reported S. c orthologs These are MFS (major facilitator) superfamily and are problematic in terms of assigning orthologs because lots of unrelated MSF family proteins can generate best hits in distantly related genomes. These 'orthologs' are only predicted by KOGs (which is the worst predictor for 'lumping' large families).

I would recommend, if possible, trying to avoid MSF family proteins for ref genome candidates if possible (at least until we have a consistent way of confirming orthologs) because: i) orthology identification is difficult ii) the homology transfers which can be made are minimal, once you have captured the fact that these are 'membrane transporters' and involved in 'membrane transport' because the specificities and processes are often not conserved, even between closely related species (there are lots of duplications, and gene losses which confuse any functional transfer) Using Treefam (non of the yeast proteins are listed as outgroups for this large orthologous cluster) http://www.treefam.org/cgi-bin/TFinfo.pl?ac=TF314445

This could be added to the false positive list.

Repeat families (from Val's email)

Watch families with LRR, TPR and TPR-related repeats. WD repeats also but these seem to be more conserved between orthologs so they aren't so problematic.

Low-complexity or coiled-coil regions (from Val's email)

These can often generate false positives based on statistically significant matches that are not due to orthology.