1 December 2020 PAINT Conference Call: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
(Created page with " =Follow up on outstanding action items= ==Update on Taxon constraints== * Dustin prepares a table at every release * '''ACTION''': Dustin to check with Jim on where are the...")
 
mNo edit summary
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
'''NEXT CALL: JANUARY Huaiyu to show how to forward track Panther 16'''


=Follow up on outstanding action items=
=Follow up on outstanding action items=
Line 6: Line 9:
* '''ACTION''': Dustin to check with Jim on where are the 'full' taxon constraints are, so that we use the same.
* '''ACTION''': Dustin to check with Jim on where are the 'full' taxon constraints are, so that we use the same.
** Jim will add the "missing" taxon constraints to the GO ontology release - https://github.com/geneontology/go-ontology/issues/19759
** Jim will add the "missing" taxon constraints to the GO ontology release - https://github.com/geneontology/go-ontology/issues/19759
** No comment as of Sept 29th - will follow up on the next call.
** Jim hasn't managed to get to this yet. How high a priority is this ?
 
=Discussion Topics=
 
 
==Problem migrating NOTs annotations ? ==
* Dustin generated a query for 14.1 NOT annotations that weren't forward-tracked to 15.0 or tracked to different nodes.
https://docs.google.com/spreadsheets/d/16L4TDDldeogYVlXEbrDuul3CRxBaG9PnmNPOysuRa60/edit#gid=0
 
* '''Example 1: PTHR10003''' was missing NOT superoxide dismutase for the CCS (copper chaperone) subunit. The CCS clade is broken into 2 in Panther15. Report has this info:
 
{| {{Prettytable}}
! new_pthr
! prev_pthrs
! prev_node_ptns
! new_node_ptns
! go_term
!  Comments Pascale
|-
| PTHR10003
| PTHR10003
|PTN004118896
|PTN004118900
|GO:0004784 (NOT, IKR), GO:0005737 (IBD), GO:0016532 (IBD)
|PTN004118900 is a plant node - as far as I remember PTN004118896 was Eukaryotes
|-
|}
 
<hr/>
 
* '''Example 2: PTHR10183'''


{| {{Prettytable}}
! new_pthr
! prev_pthrs
! prev_node_ptns
! new_node_ptns
! go_term
!  Comments Pascale
|-
|PTHR10183
|PTHR10183
|PTN002564895,PTN002564902
|PTN000809773,PTN002564886,PTN002564896
|GO:0004198 (NOT, IKR), GO:0006508 (NOT, IKR)
| Was corrected in 2018, see [https://github.com/geneontology/go-annotation/issues/2109 #2109]. However we could not have made NOT annotations to the CAPN1 node (PTN02564886)
|-
|}


=Discussion Topics=


<hr/>
* '''Example 3: PTHR10846'''
{| {{Prettytable}}
! new_pthr
! prev_pthrs
! prev_node_ptns
! new_node_ptns
! go_term
!  Comments Pascale
|-
|PTHR10846
|PTHR10846
|PTN002586308
|PTN002474463
|GO:0005802 (IBD), GO:0005887 (NOT, IRD)
|Comment is missing information (what node has been mapped to) : <code>2020-05-20: In the PANTHER15.0 update, PTN002586308 can not be directly mapped but their direct child nodes PTN002474463 can be mapped to . These nodes were annotated with GO terms GO:0005802 (IBD), GO:0005887 (NOT, IRD).</code> This one looks right to me. However PAINT does not show the NOT
|-
|}
<hr/>


==Too many HGT events predicted ? ==  
==Too many HGT events predicted ? ==  
See https://github.com/pantherdb/Helpdesk/issues/25
See https://github.com/pantherdb/Helpdesk/issues/25


-> to rediscuss when Paul is here
-> to rediscuss when Paul is here (see answer in the issue above)
 


==Issues with forward tracking IRD/IKR ==  
==Issues with forward tracking IRD/IKR ==  

Latest revision as of 14:10, 1 December 2020


NEXT CALL: JANUARY Huaiyu to show how to forward track Panther 16

Follow up on outstanding action items

Update on Taxon constraints

  • Dustin prepares a table at every release
  • ACTION: Dustin to check with Jim on where are the 'full' taxon constraints are, so that we use the same.

Discussion Topics

Problem migrating NOTs annotations ?

  • Dustin generated a query for 14.1 NOT annotations that weren't forward-tracked to 15.0 or tracked to different nodes.

https://docs.google.com/spreadsheets/d/16L4TDDldeogYVlXEbrDuul3CRxBaG9PnmNPOysuRa60/edit#gid=0

  • Example 1: PTHR10003 was missing NOT superoxide dismutase for the CCS (copper chaperone) subunit. The CCS clade is broken into 2 in Panther15. Report has this info:
new_pthr prev_pthrs  prev_node_ptns new_node_ptns go_term  Comments Pascale
 PTHR10003 PTHR10003 PTN004118896 PTN004118900 GO:0004784 (NOT, IKR), GO:0005737 (IBD), GO:0016532 (IBD) PTN004118900 is a plant node - as far as I remember PTN004118896 was Eukaryotes

  • Example 2: PTHR10183
new_pthr prev_pthrs prev_node_ptns new_node_ptns go_term Comments Pascale
PTHR10183 PTHR10183 PTN002564895,PTN002564902 PTN000809773,PTN002564886,PTN002564896 GO:0004198 (NOT, IKR), GO:0006508 (NOT, IKR)  Was corrected in 2018, see #2109. However we could not have made NOT annotations to the CAPN1 node (PTN02564886)



  • Example 3: PTHR10846
new_pthr prev_pthrs prev_node_ptns new_node_ptns go_term Comments Pascale
PTHR10846 PTHR10846 PTN002586308 PTN002474463 GO:0005802 (IBD), GO:0005887 (NOT, IRD) Comment is missing information (what node has been mapped to) : 2020-05-20: In the PANTHER15.0 update, PTN002586308 can not be directly mapped but their direct child nodes PTN002474463 can be mapped to . These nodes were annotated with GO terms GO:0005802 (IBD), GO:0005887 (NOT, IRD). This one looks right to me. However PAINT does not show the NOT

Too many HGT events predicted ?

See https://github.com/pantherdb/Helpdesk/issues/25

-> to rediscuss when Paul is here (see answer in the issue above)

Issues with forward tracking IRD/IKR

Possibly linked to too many HGT events predicted above ? See PTHR10003


Mismatches Reference proteomes and MOD taxon IDs

Paul generated a spreadsheet with species with > 100 EXP annotations https://docs.google.com/spreadsheets/d/1jN2hrTkGIRNqOb1hSCX4MGSuzzXydS7C-YhUzXb01X4/edit#gid=0

I’ve highlighted in red the ones where the taxon IDs do not match between PANTHER/UniProt and GO (four total), and in yellow the ones where there is a distinct gene/protein ID space used in the GAF that we might want to double check to make sure it’s in the UniProt mapping file so we can ensure that we can match those up. I assume the standard MOD IDs will be in the mapping files, so I didn’t highlight those. I’ve also noted several other cases, where there are additional experimental GO annotations but to a distinct taxon ID. Not sure how we want to handle this.


Known issues:

  • For pombe, UniProt Reference Proteomes uses taxon:284812, while pombase uses taxon:4896 (see email exchange 'pombe taxon').
  • For Aspergillus fumigatus, the GAF is submitted by AspGD using taxID 746128, and UniProt has mapped those annotations to sequences with taxID 330879 (which NCBI shows as a child strain of 746128), so this one is similar to the S. pombe case in that UniProt uses the strain level rather than the species.
  • For Leishmania major, the GAF is submitted by GeneDB using taxID 347515, and UniProt has mapped those annotations to sequences with taxID 5664 (which NCBI shows as a parent species to strain 347515), so this one is opposite to the S. pombe case in that UniProt uses the species level rather than strain.
  • Maria Martin also write about S. cerevisiae: This is also true for S.cerevisiae where UniProt uses the reference strain S288c (taxon:559292), and there even less of the annotations are actually coming from experiments in that particular strain, yet UniProt (and SGD) annotate everything in GO to that taxon.
  • WRT species, this is what UniProt/GOA does (from Alex):

Hi Huaiyu, We are using mapping provided by aspergillusgenome.org here http://www.aspergillusgenome.org/download/External_id_mappings/ If it has mapping to UniProt accessions, which belong to strain taxID, that what is used. We don't do mappings ourself we relaying on the DB authority to assign mapping to UniProt. I guess they are doing their best to do so. In the QuickGO fro example we include descendants by default if you filter by taxID. https://www.ebi.ac.uk/QuickGO/annotations?taxonId=746128&taxonUsage=descendants I'm not sure, but perhaps similar strategy can be applied to PAINT annotations. Best regards, Alex