Taxon-GO Checks and Commentary - Part 6
From Tanya Berardini
There are only two [inconsistencies flagged] and only one needs correction in the rules department. The other one has annotations that I will remove. nucleolar ribonuclease P complex only_in NCBITaxon:2157 Archaea I think this one is wrong because a nucleolus is part of a nucleus and Archaea don't have nuclei. Also, the def refers to eukaryotic cells.
From Stan Lauderkind:
I saw a few problems with the lines flagged for RGD:
1. If "senescence" and "organ senescence" are only plant terms, the definitions for the GO terms should be altered to indicate that. That would help eliminate conflicts before your checking script would find them. Presently, the only clue is that an Arabadopsis example is cited for each at the end of the definition.
organ senescence only_in NCBITaxon:33090 Viridiplantae
senescence only_in NCBITaxon:33090 Viridiplantae
Response: I have removed these rules. The ontology needs some work in this are to deal with the overlap in language between aging in animals and organ shedding in plants, but it will be better to make the ontology changes and then later come back and add the taxon rules.
2. The definition for "negative regulation of photoreceptor cell differentiation" cites an example in Drosophila, so it appears that the following rule is incorrect:
negative regulation of photoreceptor cell differentiation only_in NCBITaxon:33317 Protostomia
I have removed this rule
From Jim Hu:
I did a grep for EcoCyc in the taxon gaffes list, since I assume that gives the ones where our annotations were flagged, and then sorted to get a nonredundant list of GO terms involved
GO:0005739 "mitochondrion" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc ADENYLOSUCCINATE-SYN-MONOMER PurA GO:0005739 PMID:16858726 IDA protein NCBITaxon:511145 20091216 EcoliWiki
OK. In EcoliWiki, this was a NOT. The qualifier must have been lost.
(This has been fixed in the checking script)
GO:0030423 "targeting of mRNA for destruction involved in RNA interference" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc RNA0-241 SgrS GO:0030423 PMID:15522088 IMP P transcript NCBITaxon:511145 20080723 EcoCyc
OK. RNA interference should not be used for prokaryotes due to parentage issues. We added some new GO terms recently to deal with similar processes in bacteria that are not epigenetic.
(5 annotations removed: 4 experimental, 1 ISS)
GO:0019038 "provirus" only_in NCBITaxon:10239 "Viruses" :: EcoCyc EG11783-MONOMER IntA GO:0019038 PMID:7511583 ISS C protein NCBITaxon:511145 20080111 EcoCyc
This one I need to open a Sourceforge on.
(This rule was inherited from virion part and virion, which were both only_in_taxon viruses. Jane says this whole are needs overhauled, so I have deleted both rules in the meantime.)
GO:0000027 "ribosomal large subunit assembly" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc EG10881-MONOMER RplT GO:0000027 PMID:7021848 IDA protein NCBITaxon:511145 20080110 EcoCyc GO:0000028 "ribosomal small subunit assembly" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc EG10523-MONOMER KsgA GO:0000028 PMID:18990185 IMP protein NCBITaxon:511145 20090311 EcoCyc GO:0000917 "barrier septum formation" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc EG10342-MONOMER FtsQ GO:0000917 PMID:2007547 IMP protein NCBITaxon:83333 20090518 EcoliWiki GO:0006461 "protein complex assembly" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc EG11653-MONOMER CyaY GO:0006461 PMID:17650323 IMP protein NCBITaxon:83333 20080731 EcoliWiki GO:0009296 "flagellum assembly" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc FUM-FE-S FrdB GO:0009296 PMID:18337747 IMP P protein NCBITaxon:511145 20080507 EcoCyc GO:0017004 "cytochrome complex assembly" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc CCMA-MONOMER CcmA GO:0017004 PMID:7635817 IMP protein NCBITaxon:511145 20071003 UniProtKB GO:0042255 "ribosome assembly" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc EG11235-MONOMER RhlE GO:0042255 PMID:18083833 IGI UniProtKB:P0A9P6|UniProtKB:P21507 P protein NCBITaxon:511145 20080822 EcoCyc GO:0042257 "ribosomal subunit assembly" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc G7656-MONOMER ObgE GO:0042257 PMID:16980477 IMP protein NCBITaxon:511145 20090701 EcoCyc GO:0042963 "phage assembly" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc CPLX0-3934 GroEL-GroES chaperonin complex GO:0042963 PMID:7015340IMP P protein NCBITaxon:511145 20090730 EcoCyc GO:0051260 "protein homooligomerization" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc EG12177-MONOMER CutA GO:0051260 PMID:12949080 IDA protein NCBITaxon:511145 20050408 UniProtKB GO:0051607 "defense response to virus" only_in NCBITaxon:2759 "Eukaryota" :: EcoCyc CPLX0-7725 CRISPR-associated complex for antiviral defense GO:0051607 PMID:18703739 IDA P protein NCBITaxon:511145 20080910 EcoCyc
I believe all of these should not be Eukaryotes only. Some of these even have gosubset_prok notations!
- I have resolved all the problems above.
- I have checked for terms with only_in_taxon Eukaryota links that don't have only_in_taxon Prokaryote/Bacteria etc. union terms en route. The majority of the problem terms were removed by deleting the link immune system process only_in_taxon Eukaryota
- I have also removed a bunch of terms from the prokaryote subset.
- I found an interesting case where it seems that an only_in_taxon Eukaryota ancestry may be permissable as long as it is only via regulates relationships. I have written to the ontology editors list for comment (13th January 2010 subject: taxon inheritance). The picture of the graph is below.
I'm not sure about what to do about the half dozen or so cases like this, so I have left them as they are until it can be discussed.