Manager Call 2016-06-1: Difference between revisions
Jump to navigation
Jump to search
Line 37: | Line 37: | ||
**GAF: | **GAF: | ||
***Column 2 - would stay as is using identifiers for genes, gene-centric protein set, ncRNAs, and protein complexes | ***Column 2 - would stay as is using identifiers for genes, gene-centric protein set, ncRNAs, and protein complexes | ||
***With/From and Annotation Extensions: curators could use whatever identifiers they want, but their annotation group must provide a Gene Product Information (GPI) file that would allow users to map those identifiers to | ***With/From and Annotation Extensions: curators could use whatever identifiers they want, but their annotation group must provide a Gene Product Information (GPI) file that would allow users to map those identifiers to a canonical, or parent, ID as for Column 2 | ||
**GPAD: | |||
***Column 2 - would stay as is with most granular identifier | |||
***With/From and Annotation Extensions: same as above | |||
*Question 1: Should GO provide the digested GAF that contains only the canonical IDs in all columns (except Column 17 of GAF)? | |||
*Question 2: How should mapping of protein complex members be handled? We probably do want to have a mechanism for mapping between gene or gene-centric protein IDs and protein complexes and then automatically unfold annotations to each member of a complex using the contributes_to qualifier for MF? Need to check with Sandra Orchard about how mappings are currently handled. | |||
*'''AI:'''Determine if there currently are uses cases for the more granular gene or gene product information in AEs and With/From. Consult with Val and Ruth on this. | *'''AI:'''Determine if there currently are uses cases for the more granular gene or gene product information in AEs and With/From. Consult with Val and Ruth on this. | ||
*'''AI:'''More generally, look for examples of AE usage in literature. | *'''AI:'''More generally, look for examples of AE usage in literature. |
Revision as of 11:36, 2 June 2016
Agenda
Identifier Space in GO Annotations
- In response to the May 18th call's discussion on gene and gene product identifier space (see minutes), I've put together a spreadsheet that documents our current practice wrt for GAF and GPAD:
- Annotated Entity IDs
- With/From Entity IDs (note only for gene and gene product)
- Annotation Extension Entity IDs (note only for gene and gene product)
- Annotation Isoform Entity IDs
- Then, for the purposes of discussion, I also added two other possible approaches:
- Gene IDs only
- Broad range of gene, transcript, protein, protein complex entity IDs
- At the top of the spreadsheet are three general questions that we need to consider - there may be more; please add if needed
- The plan was to review the different approaches, debate the pros and cons and then either get more feedback or finalize the proposal for presentation on an annotation or all-hands call
Review action items from Geneva meeting, and add items to Trello if necessary
Periodic review of the Trello board
https://trello.com/b/IdtTLGEt/go-priorities
Minutes
Attendees: Chris, David H, Kimberly, Moni, Paola, Paul T.
Regrets: Moni Munoz-Torres (Teaching 9th & 10th graders about research and the scientific method from 7:00AM - 9:30AM PDT).
Agenda: Paola; Minutes: Kimberly
Identifier Space in GAF and GPAD
- We discussed different options for what to use as gene and gene product identifiers in GAF and GPAD.
- Much of the discussion was centered around cost/benefit for curators and users of using gene or gene-centric protein identifiers vs using more specific or granular identifiers, such as UniProtKB protein isoform IDs or PRO IDs for modified forms of proteins, for annotations.
- There is currently an important distinction between GAF and GPAD in that GPAD specs indicate that Column 2 can use the more granular identifier, e.g. P34187-1, while in GAF Column 2 uses canonical identifiers for gene, protein, ncRNA, or protein complex.
- Curators may want to capture the most granular information possible, but what use cases do we have for use of that more granular info?
- Enrichment analysis still seems to be the more common use case of GO annotations and for that, most users still just use gene or gene-centric annotations
- Possible proposal:
- GAF:
- Column 2 - would stay as is using identifiers for genes, gene-centric protein set, ncRNAs, and protein complexes
- With/From and Annotation Extensions: curators could use whatever identifiers they want, but their annotation group must provide a Gene Product Information (GPI) file that would allow users to map those identifiers to a canonical, or parent, ID as for Column 2
- GPAD:
- Column 2 - would stay as is with most granular identifier
- With/From and Annotation Extensions: same as above
- GAF:
- Question 1: Should GO provide the digested GAF that contains only the canonical IDs in all columns (except Column 17 of GAF)?
- Question 2: How should mapping of protein complex members be handled? We probably do want to have a mechanism for mapping between gene or gene-centric protein IDs and protein complexes and then automatically unfold annotations to each member of a complex using the contributes_to qualifier for MF? Need to check with Sandra Orchard about how mappings are currently handled.
- AI:Determine if there currently are uses cases for the more granular gene or gene product information in AEs and With/From. Consult with Val and Ruth on this.
- AI:More generally, look for examples of AE usage in literature.
- One possible use case mentioned in the Discussion of Gene Prioritization for Imaging Genetics Studies Using Gene Ontology and a Stratified False Discovery Rate Approach