6 October 2020 PAINT Conference Call

From GO Wiki
Jump to navigation Jump to search

Discussion Topics

GOC meeting

What do we want to present ? stats, new features (better implementation of taxon constraints), what else ?

New curators

Peifen (Phoenix) and Selina (Sue Rhee's group) are being trained by Huaiyu.

Too many HGT events predicted ?

See https://github.com/pantherdb/Helpdesk/issues/25

-> to rediscuss when Paul is here

NOT on isoforms

see PTHR47279 - bad example because I removed the primary annotations - but iso-2 was annotated to

  • NOT GO:0000981 DNA-binding transcription factor activity, RNA polymerase II-specific

and

  • NOT GO:0006357 regulation of transcription by RNA polymerase II

(maybe this is still there - depending on the release cycles of the respective resources)

-> isoform data (col 17) not loaded - but we probably want to ignore it. We want all annotations, even if they are contradictory. This applies to non-isoform situations as well -> Anushya and Huaiyu say this is already implemented, the node is pink, Anushya will send an example ACTION ITEM

View domains not always matches with InterPro in UniProt

Find examples

Topics from cancelled Sept call

New PAINT

Anushya

Curation status

http://pantree.org/tree/allTrees.jsp

Taxon Error reports

Report listing terms missing taxon constraint information.

http://paintcuration.usc.edu/validation/index.jsp

This is for developers (Dustin & al) to check if all terms are present in the taxon constraint table http://data.pantherdb.org/TaxonConstraints/TaxonConstraintsLookup.txt.

New species to consider from multi org group (Debby and maybe Val)

https://docs.google.com/document/d/1RVlRNic37R3EQZfiNjn4R7Q6R3v_XSIg1DT7uQUwW-s/edit#heading=h.yf00tpljqgtb

Mismatches Reference proteomes and MOD taxon IDs

Paul generated a spreadsheet with species with > 100 EXP annotations https://docs.google.com/spreadsheets/d/1jN2hrTkGIRNqOb1hSCX4MGSuzzXydS7C-YhUzXb01X4/edit#gid=0

I’ve highlighted in red the ones where the taxon IDs do not match between PANTHER/UniProt and GO (four total), and in yellow the ones where there is a distinct gene/protein ID space used in the GAF that we might want to double check to make sure it’s in the UniProt mapping file so we can ensure that we can match those up. I assume the standard MOD IDs will be in the mapping files, so I didn’t highlight those. I’ve also noted several other cases, where there are additional experimental GO annotations but to a distinct taxon ID. Not sure how we want to handle this.


Known issues:

  • For pombe, UniProt Reference Proteomes uses taxon:284812, while pombase uses taxon:4896 (see email exchange 'pombe taxon').
  • For Aspergillus fumigatus, the GAF is submitted by AspGD using taxID 746128, and UniProt has mapped those annotations to sequences with taxID 330879 (which NCBI shows as a child strain of 746128), so this one is similar to the S. pombe case in that UniProt uses the strain level rather than the species.
  • For Leishmania major, the GAF is submitted by GeneDB using taxID 347515, and UniProt has mapped those annotations to sequences with taxID 5664 (which NCBI shows as a parent species to strain 347515), so this one is opposite to the S. pombe case in that UniProt uses the species level rather than strain.
  • Maria Martin also write about S. cerevisiae: This is also true for S.cerevisiae where UniProt uses the reference strain S288c (taxon:559292), and there even less of the annotations are actually coming from experiments in that particular strain, yet UniProt (and SGD) annotate everything in GO to that taxon.
  • WRT species, this is what UniProt/GOA does (from Alex):

Hi Huaiyu, We are using mapping provided by aspergillusgenome.org here http://www.aspergillusgenome.org/download/External_id_mappings/ If it has mapping to UniProt accessions, which belong to strain taxID, that what is used. We don't do mappings ourself we relaying on the DB authority to assign mapping to UniProt. I guess they are doing their best to do so. In the QuickGO fro example we include descendants by default if you filter by taxID. https://www.ebi.ac.uk/QuickGO/annotations?taxonId=746128&taxonUsage=descendants I'm not sure, but perhaps similar strategy can be applied to PAINT annotations. Best regards, Alex

Follow up on June action items (last call)

Update on Taxon constraints

  • Dustin prepares a table at every release
  • ACTION: Dustin to check with Jim on where are the 'full' taxon constraints are, so that we use the same.

Update stats

https://drive.google.com/drive/folders/1cq0onIH_zpPbOH3j8h2jGNvsAk20chKD

  • Pascale would like the 'PAINT curation status' to be updated more regularly (at every monthly release?)
  • ACTION: Anushya will do it.

DONE http://pantree.org/tree/allTrees.jsp