Ontology meeting 2015-05-07
Attendees: Chris, Heiko, Harold, DavidH, DavidOS, Paola, PaulT, Tanya
- 1 release GO job fail
- 2 Double labelled edges are sometimes useful, but how do we keep them out of the -basic release ?
- 3 Follow-up: cellular component is_a anatomical structure?
- 4 GO-SO issues - not discussed
- 5 part_of vs has_part in export from nucleus - not discussed
- 6 Documentation of gorel-edit -> gorel - not discussed
release GO job fail
See recent emails e.g. "release-go - Build # 1037 - Still Failing!"
We need to fix this. Note that this error is preventing an ontology release, including a number of downstream processes (such as not getting the daily go-watchers email).
Possibly Caused by: java.io.FileNotFoundException: /home/jenkins-slave/workspace/release-go/ontology/check-obo-for-standard-release.pl (No such file or directory), although Heiko suggests maybe problem is double labelled edge in -basic version (see next item).
Heiko will look into this and fix.
Double labelled edges are sometimes useful, but how do we keep them out of the -basic release ?
(see thread: Re: [go-ontology] GO load issues)
Background: Historically GO has been a DAG in which each edge had only one label. With the move to OWL, GO no longer has this restriction, allowing us to use richer logic. In order to support users who expect these (and other) restrictions to be in place, GO (and other OBO-ish OWL ontologies), produce a -basic version of the ontology. Until now, this has been produced by making sure that multiply-labelled edges (MLE) and cycles do not exist for a small subset of relations used in the -basic release.
We have recently released versions of the -basic files containing MLEs. This has caused a load fail at MGI. These slipped through because:
- (a) We now have an inference step that relaxes intersections to relationships on release. (this is a good thing)
- (b) Our previous checks for 'Potential redundant relationships' were upstream of (a) and ignored intersections.
We clearly need checks to be in place at the correct part of the release cycle to avoid unecessary MMEs. 2 of the MMEs blocking the release are errors, and have been or will be corrected soon. BUT this is entirely valid and useful:
membrane region EquivalentTo membrane that part_of some membrane
There are other cases where a similar pattern might be useful. Lots of inference now depends on this pattern - including, as well as descendants, inference of plasma membrane location for a bunch of protein complexes that are recorded as part of some plasma membrane region.
- in the -basic file this becomes:
- is_a membrane
- relationship: part_of membrane
- (a) Implement release checks to prevent future accidental release of non basic -basic.obo
- (b) Extend current checks to intersections => a warning.
- (c) Decide on how to automatically fix valid cases. One possibility is to have a rules that always strip MLEs to => just one label based on some specified order of precedence.
Relaxing intersections to asserted relationships can create problems without checks in place. With checks on the daily official release only (and not the editors' version), we will miss some violations but we can address those the next day. Release will be delayed at most a day. Running the check on the editors' version will fail horribly so we don't do this right now, we'd need to put some processing steps into place before running that script.
DavidOS brings up example of 'membrane region' which is part_of 'membrane' and is_a 'membrane part' (viewed in OLS). Suggests stripping out truly redundant 'part of' relationships. Remove the 'membrane part' grouping class as a pre-processing step. Make sure that removing the 'x part' terms does not result in a whole bunch of terms that are is_a incomplete.
'membrane region' vs. 'membrane part' discussion - put into definition or list out parts for 'membrane region'
Solution: add term for whole membrane (e.g.plasma membrane)
Follow-up: cellular component is_a anatomical structure?
Looking at http://wiki.geneontology.org/index.php/Ontology_meeting_2015-04-30#cellular_component_is_a_anatomical_structure.3F, could we please specify action items and point person(s).
Decision: We should no longer use morphogenesis/results_in_morphogenesis to refer to assembly or generation of shape of a cellular component. But OK to make that assembly part of some antaomical structure morphogenesis -including morphogenesis of cells. Make sure that definitions and comments indicate this. e.g.: Dendritic spine morphogenesis -> dendritic spine assembly. 'anatomical structure morphogenesis' and ('results in morphogenesis of' some 'dendritic spine') > 'cellular process' that 'results in assembly of' some 'dendritic spine') => automated classification under appropriate assembly/organisation terms. AI: DOS will implement.
Merge 'cilium morphogenesis' and 'cilium assembly'? check usage first. Currently 'cilium assembly' is part_of 'cilium morphogenesis'. Lots of IMP annotations to CM, maybe because shape of cilia is odd. Note that you can assembly something internally that doesn't affect the outside structure, the whole structure may keep the same shape. Be careful about mass merging unless we don't care about the distinction between the two. AI: check whether a mass merge with a joined term morphogenesis and assembly makes sense.
- Lots of merges morphogenesis -> assembly. Worth considering difference in intent? e.g. could morphogenesis refer to shape change, e.g. branching of axon/dendrite, rather than formation? One example of this is 'mitochondrion morphogenesis, which has the note: "This term is intended for annotation of gene products involved in mitochondrial shape changes associated with development; an example is the morphogenesis of the Nebenkern during spermatogenesis."
- Possibly problematic that various tract morphogenesis terms are under axonogenesis, which is under 'neuron projection morphogenesis'. Should we just make a new 'tract morphogenesis' term to put all of these under? Should we retain some relationship with the cellular component organisation terms? Could make parallel 'involved in' terms for each.
GO-SO issues - not discussed
Copying from last week:
There are various problems with our use of SO, some of which requires co-ordination with SO dev:
- We need a bridge from SO transcript terms -> ChEBI:RNA. In the absence of this, lots of inference is mising. Will the long planned SO molecular save us, or do we need our own bridge axoims?
- We need a differentium for recording which RNA metabolic processes are processing (involve maturation). We may be able to do this using terms from SO (see next item), or we could use a similar strategy to the one we use for developmental progression via a 'results in maturation of' relation. (We may, in fact, need a combination of these).
- We use the SO terms nRNA, ncRNA and its children as if they refer to both mature and immature states of transcripts. In fact, according to SO they refer to the mature state. To align with SO properly we would need to review usage and use alternative SO terms where available. SO has an additional set of terms for primary transcripts, but no terms for immature. Primary transcript refers only to before splicing so no terms for intermediate state after splcing and before other modifications involved in maturation such as capping and polyadenylation for mRNA. Need to discuss possibilities of adding these with SO.
DOS organising meeting with Karen Elibeck. Who wants to be involved?
part_of vs has_part in export from nucleus - not discussed
dph-All of the named RNAs that are exported from the nucleus are exported as part of RNP complexes. In the ontology we had asserted that the RNA export has_part RNP export. This seemed backwards to me since the RNA is part of the RNP. I reversed these relations to make the RNA transport part_of the RNP transport.
Documentation of gorel-edit -> gorel - not discussed
Better doc of gorel-edit -> gorel transformation would be good. Would also be useful to have a Jenkins job for running test transformations while editing to check results.