Manager Call 2016-07-06

From GO Wiki
Jump to: navigation, search

Previous call minutes Minutes from the previous GO Managers Meeting can be found here. GO Managers meet using the GO Phone Conference Line from Jackson Laboratory.

Agenda

Identifier space in GO annotations (GAF, GPAD)

We discussed to allow all IDs to be used in AE, provided we can get GPIs from everyone, ticket filed: https://github.com/geneontology/go-site/issues/194

I (Melanie) talked with Tony S., and as discussed during the data capture call our GPI files don't include cross-references (too many of them to do so). We could envision a solution if it were possible to get a list of those cross-references that are needed specifically, and create a GPI with only those.

Protein Complexes

We discussed propagating functions, BP and CC annotations on a complex to the individual gene products, using the contibutes_to qualifier.

I (Melanie) discussed this with Claire, Sandra and Birgit and there are concerns that while ok for CC, the process/functions (using the contributes_to qualifier) could be confusing for users as a lot of people don't see these qualifiers - they'll only get as far a Protein X has a catalytic function associated with it and stop there.

Minutes

Attendees: Paola, Kimberly, PaulT, David H, Judy, Moni, Pascale, Melanie, ChrisM.

Regrets:

Agenda: | Minutes: (not edited)

1. Identifier space in GO annotations (GAF, GPAD):

We discussed to allow all IDs to be used in AE, provided we can get GPIs from everyone, ticket filed: https://github.com/geneontology/go-site/issues/194 Melanie talked with Tony S., and as discussed during the data capture call our GPI files don't include cross-references (too many of them to do so). We could envision a solution if it were possible to get a list of those cross-references that are needed specifically, and create a GPI with only those.

Presumably, these will go on the DBxrefs?

ProIDs in many cases overlap with UniProt IDs

In the GPI, the parent ID is the corresponding MGI parent ID. Everything should map up to the gene, that’s the ‘currency’ most use. In the dbxrefs, what do we want to include? If we have a particular C. elegans mRNA transcript, in the dbxref, do we want to put the RefSeq / Ensembl transcript ID? will it be useful?

David H - this will likely end up being a duplicate.

The idea is to have a canonical ID for every object type.

This is a GO tool, GO group in charge uses “this” and it can only be mapped downstream: if Neo is built from the different GPI files. If MGI controls the ID space - isoform level entity, etc. If MGI decides that PRI are canonical IDs, then that’s what the GO will use.

UniProt is one of the canonical identifiers in Neo.

There will be cases where PRO is not represented in UniProt - b/c there are no IDs for isoforms.

At the class level, UniProt ID is primary ID. PRO ID can be used for sub classes MGI ID can be used for mapping at gene level when there are no protein entities.

David, Melanie, Kimberly - to put together examples of what groups will put in their GPI file. This is a moving target - so let’s look exactly what would be there, and then add it as documentation to Noctua.

Column 2 of GPI file will [?] Main ID UniProt ID crossreference MGI ID If representing an isoform, and there is no way to represent an isoform on UniProt, then you add that information at the tail of the UniProt ID. The MGI ID will stand for both the gene and its gene product.

David: Column 1 is any object you want to use as annotation object. Column 7 is the gene for that object. Is this appropriate?

Chris: There is no need for that.

The google doc will clarify this - David, Melanie, Kimberly.

Should have a call with PRO people as well, to have an alignment between PRO and Neo. Chris M, David H, Darren Natale.


2. Protein Complexes

We discussed propagating functions, BP and CC annotations on a complex to the individual gene products, using the contibutes_to qualifier. Melanie discussed this with Claire, Sandra and Birgit and there are concerns that while ok for CC, the process/functions (using the contributes_to qualifier) could be confusing for users as a lot of people don't see these qualifiers - they'll only get as far a Protein X has a catalytic function associated with it and stop there.

Consensus: use contributes_to qualifier

Propagating similar components is a good idea. Question is: concerned that qualifiers added to the catalytic activity would be lost.

Answer from the room: This doesn’t seem to be a concern for others - but valid to be raised.

How do we want to annotate complexes? PaulT: Standard rules should be established. For molecular function it would be great to annotate to subunit that perform the function. That would be ideal. LEGO has a formalism for that: the sub-function that the complex performs, by one or more components of the complex. But the issue is that we also want to allow annotation when we don’t know the catalytic subunits. It would also be okay to propagate to the gene product - and the contributes_to qualifiers can be removed. We could use something like this as a proposal for how we as a consortium can try to annotate complexes and then go from there.

We want to be able to capture ‘what’ we know about the complex, whatever that is. Provide the function for the polypeptide in the context that it only carries this function when it is part of the complex.

The issue Sandra Orchard raised was the visibility of the contributes_to qualifier. PaulT: I support that, if you can you annotate to the level of the sub-function of each molecular component, but if you can’t then don’t do. So what do we do by default? Propagate to the contributes_to level? This is what should be debated.

Pascale: whatever we do, let’s make sure to check how the rules we put in place affect the logic of previously made annotations.

Paul to write the proposal and send to go-discuss for commentary.

Kimberly: Could we split out the ‘rock-solid’ stuff from the ‘causally_upstream_of’ annotations and see what it does to the enrichment analysis?