- Need for this relationship, scope what it means
- Propagation of annotations
- Examples in MF, BP and CC
- Do curators have to pay attention to the has_part relationship when they are picking terms to annotate?
Scope of Has_part relationship
First, two scenarios where the has_part links can be moderately useful as far as existing annotation is concerned: http://www.slideshare.net/cmungall/haspart-in-go
- If a gene product is involved in a process, then it must be active in at least one of the sub-steps of this process. These sub-steps are indicated by has_part relationships. We cannot deterministically infer which sub-step, but an curator can weigh up evidence from other sources. The has_part links form useful guides to where additional annotations may be required. It could also guide users to investigate neighboring terms.
- Probabilistic models can make use of the has_part relationships, to non-deterministically propagate annotations either way over the has_part links. Neither of these scenarios involve deterministic / purely logical annotation inference, so the has_part links are somewhat weak.
- But there are situations where the has_part links can be very powerful, however, this involves adding a qualifier that strengthens the claim an annotation makes. Normally an annotation states that *some* instances of GO class have the gene product. Sometimes we can make a stronger claim that *all* instances of the GO class have the gene product (in a particular species). If we can distinguish these stronger claims from other annotations, then we can propagate over has_part. This is most clearly illustrated with protein complexes (see the slides above), but can also apply to processes and functions.
I started a wiki page with a proposal for such a qualifier. It's not really ready for general circulation yet, I had hoped to flesh it out more, but it is relevant for this discussion.
CAAX-box protein maturation" has a "has_part" child
protein maturation by peptide bond cleavage BUT
protein maturation by peptide bond cleavage is NOT always a part of CAAX box processing when it occurs, so annotations to this term could not be propagated via has_part. You *cannot* propagate annotations from "CAAX box protein maturation" to "protein maturation by peptide bond cleavage".
Most of the discussion was spurred by this question from Val:
SGD would not make the 978 annotation because it is less granular than the 1077 term that includes the same information. MGI would make this annotation because they feel both are useful. 1077: sequence-specific regulatory transcription factor site binding RNA polymerase II transcription factor activity involved in positive regulation of transcription from RNA polymerase II promoter 978: RNA polymerase II regulatory transcription factor site sequence-specific DNA binding 978 does not inherit anything from 1077, and it is not supposed to. You would make the annotation to 978 if you wanted to say "DNA binding" WITHOUT any statement about transcription factor activity. If you choose to annotate to 978 only, you are making a statement about DNA binding ONLY.
SGD would annotate to 1077 only because we want to make the statement about "DNA binding transcription factor activity, etc..." and the "DNA binding" part is already coded into this term through the has_part relationship. Thus, we choose not to also make the annotation to 978 because it is already stated as a component of the annotation to 1077 alone. Karen: When I was at the transcription meeting in June, someone asked me about being able to find all of the sequence specific DNA binding factors involved in transcription. This is something people want to be able to do. So, when we annotate to terms like 1077 which have has_part relationships to terms which represent sequence-specific DNA binding functions, we want to make sure that genes annotated to terms like 1077 get included in the results of queries for sequence specific DNA binding factors. Val: Why can't- GO:0000978 RNA polymerase II regulatory transcription factor site sequence-specific DNA binding be part_of GO:0001077 sequence-specific regulatory transcription factor site binding RNA polymerase II transcription factor activity involved in positive regulation of transcription from RNA polymerase II promoter
Because: It is correct to say that a "DNA-binding transcription factor activity" must have_part "DNA binding"; the reverse is not necessarily true.
The has_part relationships in MF should be considered separately from the ones in BP and CC. These are primarily for linking the MF to the necessary ligand, i.e. has_part X binding. In the case of these intra-MF links, annotations *do* always propagate (see my talk from Geneva). I agree on the surface this sounds arbitrary, but it's justified based on the fact that a determining feature of MF classes is that the gene product participates throughout the duration of the activity.
nucleic acid binding transcription factor activity has_part nucleic acid binding, http://www.slideshare.net/cmungall/haspart-in-go
- Fatty acid synthase example
ribosome has_part ribosomal subunit
large_subunit part_of ribosome
"annotation can't be transitive over this relationship because not all instances of has_part we have in the ontology are always true"
This is correct, the has_part relation should be ignored when making deterministic / purely logical inferences in BP and CC. But this doesn't mean these links are completely useless. There are a few cases where the links are informative for humans and machines; and furthermore, ***if we extend the annotation paradigm***, the links can actually be extremely powerful for making deterministic inferences.
Outstanding issues with QC checks (if any)
Chris talked for the most part. File:Go-has part-Dec-2010.pdf
- In general there was some confusion about the meaning of has_part relationship? does it mean 'sometimes part of' or 'always part_of'?
- there are issues with the way TFIIH complex is represented. Does has_part mean it is sometimes partof? If not then how is it different form part_of?
- Don't think of the has_part relationship when annotating
- Should annotators do more work or should the ontology do the work? Chris will come up with a proposal to address some of the questions raised at the meeting