10 November 2020 PAINT Conference Call: Difference between revisions
(4 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
CC -> is_active_in | CC -> is_active_in | ||
CC -> children of protein-containing complexes -> part_of | CC -> children of protein-containing complexes -> part_of | ||
'''-> OK, Dustin has coded this already.''' | |||
==Following up on previous items== | ==Following up on previous items== | ||
Line 21: | Line 24: | ||
-> Anushya and Huaiyu say this is already implemented, the node is pink, Anushya will send an example | -> Anushya and Huaiyu say this is already implemented, the node is pink, Anushya will send an example | ||
=> Example: PTHR23103 | |||
'''ACTION ITEM: Pascale to add to documentation''' | |||
===Mismatches Reference proteomes and MOD taxon IDs=== | ===Mismatches Reference proteomes and MOD taxon IDs=== | ||
Line 41: | Line 46: | ||
''Hi Huaiyu, We are using mapping provided by aspergillusgenome.org here http://www.aspergillusgenome.org/download/External_id_mappings/ If it has mapping to UniProt accessions, which belong to strain taxID, that what is used. We don't do mappings ourself we relaying on the DB authority to assign mapping to UniProt. I guess they are doing their best to do so. In the QuickGO fro example we include descendants by default if you filter by taxID. https://www.ebi.ac.uk/QuickGO/annotations?taxonId=746128&taxonUsage=descendants I'm not sure, but perhaps similar strategy can be applied to PAINT annotations. Best regards, Alex'' | ''Hi Huaiyu, We are using mapping provided by aspergillusgenome.org here http://www.aspergillusgenome.org/download/External_id_mappings/ If it has mapping to UniProt accessions, which belong to strain taxID, that what is used. We don't do mappings ourself we relaying on the DB authority to assign mapping to UniProt. I guess they are doing their best to do so. In the QuickGO fro example we include descendants by default if you filter by taxID. https://www.ebi.ac.uk/QuickGO/annotations?taxonId=746128&taxonUsage=descendants I'm not sure, but perhaps similar strategy can be applied to PAINT annotations. Best regards, Alex'' | ||
'''ACTION ITEM: Marc to check with Sandrine Pilbout at Swiss-Prot to see if taxon can be harmonized''' | |||
===Update on Taxon constraints=== | ===Update on Taxon constraints=== | ||
Line 47: | Line 53: | ||
** Jim will add the "missing" taxon constraints to the GO ontology release - https://github.com/geneontology/go-ontology/issues/19759 | ** Jim will add the "missing" taxon constraints to the GO ontology release - https://github.com/geneontology/go-ontology/issues/19759 | ||
** No comment as of Sept 29th - will follow up on the next call. | ** No comment as of Sept 29th - will follow up on the next call. | ||
'''ACTION ITEM: Follow up again''' | |||
==New discussion points== | ==New discussion points== | ||
Line 54: | Line 60: | ||
https://github.com/geneontology/go-site/issues/1564 | https://github.com/geneontology/go-site/issues/1564 | ||
-> We could probably just use the genus, instead of 'bacteria' | -> We could probably just use the genus, instead of 'bacteria' | ||
=> '''ACTION ITEM: Dustin to see if we can use 'names' from this table: http://pantherdb.org/panther/summaryStats.jsp''' | |||
===Review tree=== | ===Review tree=== | ||
PTHR42918 Lys tRNA synthetase & inflammation - see https://github.com/geneontology/go-annotation/issues/3224 | |||
How much evidence is enough ? | How much evidence is enough ? | ||
'''ACTION ITEM: Assign to whoever curated the tree for review ''' | |||
==Adding a limit for the with/from field output in the IBA GAFs ? == | ==Adding a limit for the with/from field output in the IBA GAFs ? == | ||
Line 67: | Line 74: | ||
Not sure if this is the problem annotation, but this breaks the QuickGO/GOA parser (expects fewer characters) | Not sure if this is the problem annotation, but this breaks the QuickGO/GOA parser (expects fewer characters) | ||
'''ACTION ITEM: Pascale tell Alex that this is the data - we dont plan to change it, we export all evidence''' | |||
[[Category:PAINT]] | [[Category:PAINT]] |
Latest revision as of 14:21, 10 November 2020
Discussion Topics
Default qualifiers
See https://github.com/pantherdb/fullgo_paint_update/issues/45
BP -> involved_in MF -> enables CC -> is_active_in CC -> children of protein-containing complexes -> part_of
-> OK, Dustin has coded this already.
Following up on previous items
Contradictory NOT annotations
see PTHR47279 - bad example because I removed the primary annotations - but iso-2 was annotated to
- NOT GO:0000981 DNA-binding transcription factor activity, RNA polymerase II-specific
and
- NOT GO:0006357 regulation of transcription by RNA polymerase II
(maybe this is still there - depending on the release cycles of the respective resources)
-> Anushya and Huaiyu say this is already implemented, the node is pink, Anushya will send an example => Example: PTHR23103 ACTION ITEM: Pascale to add to documentation
Mismatches Reference proteomes and MOD taxon IDs
Paul generated a spreadsheet with species with > 100 EXP annotations https://docs.google.com/spreadsheets/d/1jN2hrTkGIRNqOb1hSCX4MGSuzzXydS7C-YhUzXb01X4/edit#gid=0
I’ve highlighted in red the ones where the taxon IDs do not match between PANTHER/UniProt and GO (four total), and in yellow the ones where there is a distinct gene/protein ID space used in the GAF that we might want to double check to make sure it’s in the UniProt mapping file so we can ensure that we can match those up. I assume the standard MOD IDs will be in the mapping files, so I didn’t highlight those. I’ve also noted several other cases, where there are additional experimental GO annotations but to a distinct taxon ID. Not sure how we want to handle this.
Known issues:
- For pombe, UniProt Reference Proteomes uses taxon:284812, while pombase uses taxon:4896 (see email exchange 'pombe taxon').
- For Aspergillus fumigatus, the GAF is submitted by AspGD using taxID 746128, and UniProt has mapped those annotations to sequences with taxID 330879 (which NCBI shows as a child strain of 746128), so this one is similar to the S. pombe case in that UniProt uses the strain level rather than the species.
- For Leishmania major, the GAF is submitted by GeneDB using taxID 347515, and UniProt has mapped those annotations to sequences with taxID 5664 (which NCBI shows as a parent species to strain 347515), so this one is opposite to the S. pombe case in that UniProt uses the species level rather than strain.
- Maria Martin also write about S. cerevisiae: This is also true for S.cerevisiae where UniProt uses the reference strain S288c (taxon:559292), and there even less of the annotations are actually coming from experiments in that particular strain, yet UniProt (and SGD) annotate everything in GO to that taxon.
- WRT species, this is what UniProt/GOA does (from Alex):
Hi Huaiyu, We are using mapping provided by aspergillusgenome.org here http://www.aspergillusgenome.org/download/External_id_mappings/ If it has mapping to UniProt accessions, which belong to strain taxID, that what is used. We don't do mappings ourself we relaying on the DB authority to assign mapping to UniProt. I guess they are doing their best to do so. In the QuickGO fro example we include descendants by default if you filter by taxID. https://www.ebi.ac.uk/QuickGO/annotations?taxonId=746128&taxonUsage=descendants I'm not sure, but perhaps similar strategy can be applied to PAINT annotations. Best regards, Alex
ACTION ITEM: Marc to check with Sandrine Pilbout at Swiss-Prot to see if taxon can be harmonized
Update on Taxon constraints
- Dustin prepares a table at every release
- ACTION: Dustin to check with Jim on where are the 'full' taxon constraints are, so that we use the same.
- Jim will add the "missing" taxon constraints to the GO ontology release - https://github.com/geneontology/go-ontology/issues/19759
- No comment as of Sept 29th - will follow up on the next call.
ACTION ITEM: Follow up again
New discussion points
common_name in go-reference-species.yaml is not unique
https://github.com/geneontology/go-site/issues/1564 -> We could probably just use the genus, instead of 'bacteria' => ACTION ITEM: Dustin to see if we can use 'names' from this table: http://pantherdb.org/panther/summaryStats.jsp
Review tree
PTHR42918 Lys tRNA synthetase & inflammation - see https://github.com/geneontology/go-annotation/issues/3224 How much evidence is enough ? ACTION ITEM: Assign to whoever curated the tree for review
Adding a limit for the with/from field output in the IBA GAFs ?
For example GO term GO:0007275 has a ton of descendant terms and this gene is in a large family (PTHR24416[1] - 1747 genes) so more genes to source exp annotations from.
for example: UniProtKB P54764 EPHA4 GO:0007275 PMID:21873635 IBA PANTHER:PTN001230349|FB:FBgn0025936|MGI:MGI:98277|ZFIN:ZDB-GENE-980526-307|MGI:MGI:95522|FB:FBgn0040505|UniProtKB:P21802|RGD:2561|RGD:2425|RGD:2621|ZFIN:ZDB-GENE-020503-1|UniProtKB:Q02763|WB:WBGene00004740|UniProtKB:Q16288|RGD:2869|ZFIN:ZDB-GENE-980526-255|WB:WBGene00006897|MGI:MGI:95294|UniProtKB:Q91987|MGI:MGI:97902|UniProtKB:P11362|UniProtKB:P30530|WB:WBGene00003007|FB:FBgn0033791|MGI:MGI:1345277|WB:WBGene00017381|ZFIN:ZDB-GENE-980526-326|WB:WBGene00000289|ZFIN:ZDB-GENE-050407-1|WB:WBGene00001184|RGD:3285|RGD:620144|MGI:MGI:95558|MGI:MGI:104294|ZFIN:ZDB-GENE-041001-112|FB:FBgn0283499|MGI:MGI:96969|UniProtKB:P04629|MGI:MGI:96433|MGI:MGI:97383|UniProtKB:P54760|MGI:MGI:99216|UniProtKB:P10721|WB:WBGene00002299|MGI:MGI:95411|MGI:MGI:104770|RGD:620486|UniProtKB:P36888|UniProtKB:P28693|ZFIN:ZDB-GENE-041014-1|RGD:628622|RGD:1305275|UniProtKB:P29317|RGD:620028|UniProtKB:P35968|UniProtKB:P07333|MGI:MGI:1096337|MGI:MGI:1347520|MGI:MGI:99611|ZFIN:ZDB-GENE-031118-121|ZFIN:ZDB-GENE-030323-1|RGD:2965|MGI:MGI:109378|ZFIN:ZDB-GENE-010126-3|RGD:61308|FB:FBgn0024245|ZFIN:ZDB-GENE-020503-2|FB:FBgn0015380|RGD:620713|MGI:MGI:98664|WB:WBGene00006896|RGD:621642|MGI:MGI:97531|ZFIN:ZDB-GENE-030918-1|UniProtKB:P54756|MGI:MGI:103305|RGD:620831|ZFIN:ZDB-GENE-030131-6458|ZFIN:ZDB-GENE-990415-65|ZFIN:ZDB-GENE-001205-1|MGI:MGI:95524|UniProtKB:P54762|WB:WBGene00006894|ZFIN:ZDB-GENE-050916-2|WB:WBGene00000898|ZFIN:ZDB-GENE-010724-10|MGI:MGI:1346037|WB:WBGene00020504|MGI:MGI:1347521|RGD:620980|MGI:MGI:97384|MGI:MGI:95559|ZFIN:ZDB-GENE-050107-1|UniProtKB:Q9PUF6|ZFIN:ZDB-GENE-060427-5|UniProtKB:F1NHC7|ZFIN:ZDB-GENE-030826-6|MGI:MGI:95278|FB:FBgn0010407|FB:FBgn0032006|UniProtKB:Q15303|ZFIN:ZDB-GENE-990415-62|RGD:3214|UniProtKB:A0A2R8QHE8|MGI:MGI:96575|FB:FBgn0285896|RGD:2611|MGI:MGI:1339758|RGD:3211|FB:FBgn0003731|ZFIN:ZDB-GENE-020503-4|MGI:MGI:97530|RGD:3213|UniProtKB:P35916|RGD:620568|MGI:MGI:95276|RGD:3556|FB:FBgn0020391|MGI:MGI:104771|RGD:1560587|FB:FBgn0003733|MGI:MGI:99906|FB:FBgn0010389|UniProtKB:P08581|MGI:MGI:95410|MGI:MGI:95561|RGD:69323|RGD:1559469|ZFIN:ZDB-GENE-070713-2|MGI:MGI:104757|MGI:MGI:95523|ZFIN:ZDB-GENE-010126-2|MGI:MGI:96965|RGD:2252|RGD:2543|WB:WBGene00016104|MGI:MGI:1347244|MGI:MGI:96677|MGI:MGI:99612|MGI:MGI:99654|WB:WBGene00003863|RGD:2556|ZFIN:ZDB-GENE-001207-7|MGI:MGI:97385|RGD:2917|WB:WBGene00006868|ZFIN:ZDB-GENE-000705-1|ZFIN:ZDB-GENE-990415-208|RGD:3082|MGI:MGI:101766|ZFIN:ZDB-GENE-071218-3|ZFIN:ZDB-GENE-050114-4|RGD:3284|FB:FBgn0003366|ZFIN:ZDB-GENE-070209-277|UniProtKB:P06213|MGI:MGI:95525|ZFIN:ZDB-GENE-990415-56|MGI:MGI:96683|ZFIN:ZDB-GENE-030918-4|FB:FBgn0011829|ZFIN:ZDB-GENE-990415-55|ZFIN:ZDB-GENE-020503-3 P Ephrin type-A receptor 4 UniProtKB:P54764|PTN002521356 protein taxon:9606 20201025 GO_Central
Not sure if this is the problem annotation, but this breaks the QuickGO/GOA parser (expects fewer characters) ACTION ITEM: Pascale tell Alex that this is the data - we dont plan to change it, we export all evidence