Annotation Conf. Call July 23, 2013
From GO Wiki
Consolidating protein binding curation pipelines
- Background- Some groups capture every protein binding experiment as a GO annotation, while some don't. Is protein binding really a molecular function? There is inconsistency across the consortium in how people annotate to protein binding and we have struggled to come up with guidelines for consistent curation for this term. Some groups don't annotate protein binding using GO because BioGRID or IntAct captures these interactions for their species/db. We want to keep protein binding terms in the ontology. How can we best support annotations to this term?
- Proposal- Consolidate all protein binding curation pipelines into the GAF
- Do we just take all annotations from all pipelines? Does this mean each GAF file will have a huge # of protein binding data?
- Some groups have stricter guidelines for capturing protein binding annotations for GO (e.g. Pombase). Will this effort dilute those annotations?
- Why not include include all interaction data? It will be yet another process to figure out what the filtering step should be.
- Should we use a mapping strategy to map these interactions to specific GO terms?
- Other issues to consider?
So far there are no serious objections to moving forward with this proposal. If curators have concerns, they should speak up sooner than later.
- Harold- Can you clarify if we are talking about 2 GAFs/taxon or one single GAF for all annotations including protein binding?
- Rama- We don't want to confuse the users anymore by producing 2 GAFs. Proposal is to convert all protein-interaction data into protein binding annotations and append them to the GAF that each group produces.
- How do we let the users filter these annotations?
- Two ways- Assigned by column and in addition the interaction databases capture exp. method as evidence. We can use ECO to capture that detail in the Evidence code column. So users are able to filter as they wish. GOC curators can continue to curate protein binding data and these will have their respective assignedBy names.
- How does this affect Term enrichment? SHould we offer an option to remove interactions from BioGRID/IntAct?
- Something we can deal with. The enrichment part itself shouldn't matter. But we shd test it out.
- Granular mapping terms- BioGRID or Intact don't assign any granular terms for protein binding. IT seems like a waste of effort for them to capture the interaction and we look at the same paper to make a granular annotation.
- We can map these annotations to granular terms. But what do we do for Assigned by column (we face the same issue with MF-BP inferences too, provenance of evidence).
- Seems like we need a new evidence code for these cases
- When we map these interactions to finer terms, can we have a PMID and a GO_REF to indicate the original source and the inferred annotation?
- Don't know if that is possible, something to look into
- Harold- There are lot of granular terms in the protein binding branch that are based on family, function and so on. We need to come with guidelines for how to use these terms, which terms shouldn't be used (like enzyme binding) etc.