List of potentially problematic families for all vs. all BLAST methods of orthology determination (Retired): Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(New page: '''Table of erroneous orthologous relationships predicted by sequence-based methods:''' Columns: query gene query organism false positive (fp) / false negative (fn) problem organism:prob...)
 
mNo edit summary
 
(24 intermediate revisions by 4 users not shown)
Line 1: Line 1:
'''Table of erroneous orthologous relationships predicted by sequence-based methods:'''
[[Category:PAINT Archived]]
==Table of erroneous orthologous relationships predicted by sequence-based methods==


Columns:
Columns:


query gene
*query gene
query organism
*query species
false positive (fp) / false negative (fn)
*false positive (fp) / false negative (fn): indicate fp or fn
problem organism:problem gene(s)|
*problem organism:problem gene(s): separate multiples with a -
method
*method
note
*note
'''
List of potentially problematic families:'''


{|border="1"
| align="center" style="background:#f0f0f0;" |'''Query gene'''
| align="center" style="background:#f0f0f0;"|'''Query species'''
| align="center" style="background:#f0f0f0;"|'''False positive (fp) / False negative (fn)'''
| align="center" style="background:#f0f0f0;"|'''Problem species:problem gene(s) -'''
| align="center" style="background:#f0f0f0;"|'''Method'''
| align="center" style="background:#f0f0f0;"|'''Note'''
|-
| TNNT2||human||(fp)|| s.cerevisiae:S000006026 and S000000376||YOGY/Inparanoid||see SF geneontology-Ref Genome Ortholog Set Completion-1903284
|-
| TPM1||human||fn||s.cerevisiae:TPM1-s.pombe:cdc8||YOGY||tropomycin not picked up by any of the predictors in YOGY for yeasts (and possibly other organisms) using the human a query but are conserved from yeast to human, found in treefam
|-
| MYL3||human||fn||s.cerevisiae:MLC1-s.pombe:cdc4||YOGY||myosin light chain not picked up by any of the predictors in YOGY for yeasts (and possibly other organisms) using the human a query but are conserved from yeast to human, found in treefam
|-
| APTX||human||fn||s.cerevisiae:HNT3-s.pombe:SPCC18.09C||YOGY||aprataxin not picked up by any of the predictors in YOGY for yeasts (and possibly other organisms) using the human a query but are conserved from yeast to human, found in treefam
|-
| ATXN2||human||fn||s.cerevisiae:PBP1-s.pombe:SPBC21B10.03c ||YOGY||ataxin-2 not picked up by any of the predictors in YOGY for yeasts (and possibly other organisms) using the human a query but are conserved from yeast to human, Treefam missed pombe SPCC18.09c, but picks up cerevisiae HNT3
|-
| RPS21||human||fn||Dictyostelium rps21 gene was missed (DDB0231061)||BLAST, inparanoid||
|-
| IMMT||human||fp?||Dicty DDB0267127 (DDB0191041) ||BLAST gives different results
|-
|C20orf43 ||human||fn?||Dicty DDB0267130 (DDB0187988)||BLAST and in paranoid clearly pick up that gene
|}
==List of potentially problematic families==
===Major Facilitator Superfamily (from Val's email)===
[http://accordion.lbl.gov/cgi-bin/amigo/amigo?mode=homolset_graph&set=222 SLC22A14 and SLC22A11]
[http://accordion.lbl.gov/cgi-bin/amigo/amigo?mode=homolset_graph&set=222 SLC22A14 and SLC22A11]


Line 23: Line 54:
http://www.treefam.org/cgi-bin/TFinfo.pl?ac=TF314445
http://www.treefam.org/cgi-bin/TFinfo.pl?ac=TF314445


This could be added to the false postive list.
This could be added to the false positive list.
 


Perhaps we should also begin a list on the Wiki of families which often generate false positives. Other families I can think of immediately include the families containing the repeat motifs
===Repeat families (from Val's email)===
LRR, TPR and TPR-related repeats. WD repeats also but these seem to be more conserved between orthologs so they aren't so problematic.
Watch families with LRR, TPR and TPR-related repeats. WD repeats also but these seem to be more conserved between orthologs so they aren't so problematic.


Also problematic are any proteins which contain low -complexity or coiled-coil regions as these can often generate false positives based on statistically significant matches which are not due to orthology.
===Low-complexity or coiled-coil regions (from Val's email)===
These can often generate false positives based on statistically significant matches that are not due to orthology.

Latest revision as of 11:25, 12 April 2019

Table of erroneous orthologous relationships predicted by sequence-based methods

Columns:

  • query gene
  • query species
  • false positive (fp) / false negative (fn): indicate fp or fn
  • problem organism:problem gene(s): separate multiples with a -
  • method
  • note
Query gene Query species False positive (fp) / False negative (fn) Problem species:problem gene(s) - Method Note
TNNT2 human (fp) s.cerevisiae:S000006026 and S000000376 YOGY/Inparanoid see SF geneontology-Ref Genome Ortholog Set Completion-1903284
TPM1 human fn s.cerevisiae:TPM1-s.pombe:cdc8 YOGY tropomycin not picked up by any of the predictors in YOGY for yeasts (and possibly other organisms) using the human a query but are conserved from yeast to human, found in treefam
MYL3 human fn s.cerevisiae:MLC1-s.pombe:cdc4 YOGY myosin light chain not picked up by any of the predictors in YOGY for yeasts (and possibly other organisms) using the human a query but are conserved from yeast to human, found in treefam
APTX human fn s.cerevisiae:HNT3-s.pombe:SPCC18.09C YOGY aprataxin not picked up by any of the predictors in YOGY for yeasts (and possibly other organisms) using the human a query but are conserved from yeast to human, found in treefam
ATXN2 human fn s.cerevisiae:PBP1-s.pombe:SPBC21B10.03c YOGY ataxin-2 not picked up by any of the predictors in YOGY for yeasts (and possibly other organisms) using the human a query but are conserved from yeast to human, Treefam missed pombe SPCC18.09c, but picks up cerevisiae HNT3
RPS21 human fn Dictyostelium rps21 gene was missed (DDB0231061) BLAST, inparanoid
IMMT human fp? Dicty DDB0267127 (DDB0191041) BLAST gives different results
C20orf43 human fn? Dicty DDB0267130 (DDB0187988) BLAST and in paranoid clearly pick up that gene

List of potentially problematic families

Major Facilitator Superfamily (from Val's email)

SLC22A14 and SLC22A11

From Val: I just spotted 2 genes in the new list which have 12 reported S. c orthologs These are MFS (major facilitator) superfamily and are problematic in terms of assigning orthologs because lots of unrelated MSF family proteins can generate best hits in distantly related genomes. These 'orthologs' are only predicted by KOGs (which is the worst predictor for 'lumping' large families).

I would recommend, if possible, trying to avoid MSF family proteins for ref genome candidates if possible (at least until we have a consistent way of confirming orthologs) because: i) orthology identification is difficult ii) the homology transfers which can be made are minimal, once you have captured the fact that these are 'membrane transporters' and involved in 'membrane transport' because the specificities and processes are often not conserved, even between closely related species (there are lots of duplications, and gene losses which confuse any functional transfer) Using Treefam (non of the yeast proteins are listed as outgroups for this large orthologous cluster) http://www.treefam.org/cgi-bin/TFinfo.pl?ac=TF314445

This could be added to the false positive list.

Repeat families (from Val's email)

Watch families with LRR, TPR and TPR-related repeats. WD repeats also but these seem to be more conserved between orthologs so they aren't so problematic.

Low-complexity or coiled-coil regions (from Val's email)

These can often generate false positives based on statistically significant matches that are not due to orthology.