Biological process xp self ProgressNotesNov2008
In a meeting today at the Editorial Office we figured out who will be responsible for which cross products file. I am responsible for the 'biological_process_xp_self' file.
I am to read through the file and check the biological content of the intersection tags. Any edits are to be made by hand in the file, but OBO-Edit2 is currently able to be used as a browser. Where specific domain knowledge is needed I can send list of terms to people who have the correct expertise.
The hope is that these cross products should be in the editors file by the beginning of January and in the public domain by the end of March.
I have started trying to work with the files. I loaded up the files list in my import file and made the following notes about what happened.
When I load the files I get two orphans at the top. They are
ID = obol:culmination and
ID = interphase_by_interphase_microtubule_organizing_center
Verification manager warnings on load:
culmination during sorocarp development (GO:0031154) generated 2 warnings:
The term culmination during sorocarp development (GO:0031154) links to the dangling identifier obol:culmination The cross product definition of culmination during sorocarp development (GO:0031154) refers to a dangling parent /obol:culmination\. derived_into (OBO_REL:derived_into) generated 1 warning: The term derived_into (OBO_REL:derived_into) links to the secondary identifier OBO_REL:derives_from has_improper_part (OBO_REL:has_improper_part) generated 1 warning: The term has_improper_part (OBO_REL:has_improper_part) links to the obsolete term improper_part_of (OBO_REL:improper_part_of) improper_part_of (OBO_REL:improper_part_of) generated 1 warning: The term improper_part_of (OBO_REL:improper_part_of) links to the obsolete term has_improper_part (OBO_REL:has_improper_part) interphase microtubule nucleation by interphase microtubule organizing center (GO:0051415) generated 2 warnings: The term interphase microtubule nucleation by interphase microtubule organizing center (GO:0051415) links to the dangling identifier interphase_by_interphase_microtubule_organizing_center The cross product definition of interphase microtubule nucleation by interphase microtubule organizing center (GO:0051415) refers to a dangling parent /interphase_by_interphase_microtubule_organizing_center\.
If I do a link search for anything that has is intersection then the last column of the results is very wide and cannot be made smaller.
Link Search usage:
Selecting things in the link search results panel does not result in them being shown in the OTE or in the Graph viewer. The OTE moves to a new place but there is no way of knowing which term I should be looking at.
The tab on the text editor has its name text much smaller than the text on all the other component tabs.
OBO-Edit ran out of memory and crashed even with the reasoner off on my mac.
Components: 1 OTE, Graphviz Viewer, Graph viewer, text editor, link search + one results panel. The cause of the crash was the graph viewer trying to display a term.
This also happens if I load the files and have a link search + results, a term search (unused), a graph viewer and a parent editor.
Search doesn't remember config settings over restart
These issues have all been added to the tracker.
Processing XP files to be checked mostly by eye
I have written two scripts to help with checking the XP composition. Currently all we have are term names to go on, and I think that having the defs available alongside term names will save time. The scripts are:
The first changes the GO live file to tab-delimited format:
GO:id \t definition
The second takes that file and the XP file with this format:
[Term] id: GO:0000022 ! mitotic spindle elongation intersection_of: GO:0051231 ! spindle elongation intersection_of: part_of GO:0007067 ! mitosis [Term] id: GO:0000070 ! mitotic sister chromatid segregation intersection_of: GO:0007059 ! chromosome segregation intersection_of: part_of GO:0007067 ! mitosis
and where a term id is quoted it adds in the def of the term underneath as follows:
[Term] id: GO:0000022 ! mitotic spindle elongation def: "Lengthening of the distance between poles of the mitotic spindle." [GOC:mah] intersection_of: GO:0051231 ! spindle elongation def: "The cell cycle process whereby the distance is lengthened between poles of the spindle." [GOC:ai] intersection_of: part_of GO:0007067 ! mitosis def: "Progression through mitosis, the division of the eukaryotic cell nucleus to produce two daughter nuclei that, usually, contain the identical chromosome complement to their mother." [GOC:ma, ISBN:0198547684]
I have checked the copy of my file with defs into scratch/xps and I will delete terms from that file as they are checked and found to be fine. The file is called biological_process_xp_self_with_defs.obo.
I have read right through the file and picked out the terms that need changes. These terms are now the only ones in the file xps/biological_process_xp_self_with_defs.obo. I have also made notes on the things that I noticed and on the specific changes needed in the file xps/biological_process_xp_self_notes.txt.
Next I will start making the changes that are needed and will report back to Chris the things that I found.
Chris says that when OBO-Edit displays a new relationship type as is_a then we should in fact include the typedef in the relationship file. He is going to arrange that.
I have been looking into the problem of disjoints in the Tree Viewer, and there is a proposal to have a single disjoint relationship between pairs of terms, as the relationship is symmetrical anyway. There is opposition to this proposal, and requests that OBO-Edit be changed to handle the pairs of relationships.
I discovered that the Graph Viewer is loading typedefs from the live file, while the OTE is loading typedefs from other files. I am looking into how relationships are handles generally so as to figure out what needs to be done.
I have tried loading my files and running the reasoner. The reasoner runs in a minute or so, but fatal errors are produced:
M phase of meiotic cell cycle (GO:0051327) generated 1 error: M phase of meiotic cell cycle (GO:0051327) is part of a cycle over the property part_of. M phase of mitotic cell cycle (GO:0000087) generated 1 error: M phase of mitotic cell cycle (GO:0000087) is part of a cycle over the property part_of. maintenance of turgor in appressorium by melanization (GO:0075043) generated 1 error: maintenance of turgor in appressorium by melanization (GO:0075043) has disjoint superclasses multi-organism process and cellular process meiosis (GO:0007126) generated 1 error: meiosis (GO:0007126) is part of a cycle over the property part_of. mitosis (GO:0007067) generated 1 error: mitosis (GO:0007067) is part of a cycle over the property part_of. smooth muscle relaxation of the bladder outlet (GO:0060085) generated 1 error: smooth muscle relaxation of the bladder outlet (GO:0060085) is part of a cycle over the property is_a (OBO_REL:is_a). synaptic transmission involved in micturition (GO:0060084) generated 1 error: synaptic transmission involved in micturition (GO:0060084) is part of a cycle over the property is_a (OBO_REL:is_a).
12th December meeting
We had a meeting to discuss cross products and we all gave progress updates. We discussed the issue of disjoint symmetry and the idea of only putting one disjoint relationship between pairs of terms. I have committed to test all the OBO-Edit components to see how many are broken by this and how many are fixed. I do not completely understand the situation with the OTE so need to remember to mention that this is not fully tested.
We discussed the idea of saving from OE2 to gene_ontology_write.obo, and I am to write the first draft of the proposal.
1st April 2009
I have commented out all of the problematic terms in the self file and written to Chris to ask if he has a script to move them to the new unvetted file that I have made. David and I are planning to start looking at the paths to root on the vetted terms soon.
3rd April 2009
Participants: David Hill, Jen Deegan
We started working through the unvetted file and making changes. We just changed the live GO graph and made notes on how the rules in obol might be changed, on the understanding the Chris will reparse to improve the intersection tags. We have not edited the intersection tags at all. I made notes on our changes in the unvetted file and each note starts '!!'
I will write to Chris to tell him what we have done.
Later: Chris wrote back to say that we should also update the intersection file by hand in this file as it is not sensible in this case to regenerate the intersection links by obol.
9th April 2009
I have loaded up the intersection file with the current live GO and am checking the ancestors of the logically defined terms to see that they make sense. Previously I only checked that the names and defs of the defined terms and intesection terms made sense.
1) Term GO:0031180 to GO:0031200 have been merged into other terms so I will delete these terms from the intersection file. Same for GO:0031178
I have sent two questions to the ontology developers list:
I have just been looking at the ancestors of my cross product terms and I found a weird thing that could maybe be tested for systematically in the other stanzas. In the Graphviz view below you can see that there is a cycle between 'mitosis', 'mitotic cell cycle' and 'M phase of mitotic cell cycle'. The cycle is made up of intersection tags and normal relationships, but it still wonky. Do you think such cycles could be checked by a script or something? I don't think we can currently do it in OBO-Edit.
If I find a graph, like the one below, where the intersection tags show exactly the same as the existing relationships, is it right that I leave both sets in? Would I be right in thinking that the normal relationships show what is 'necessary' for the child term, where the intersectionss show what is 'necessary and sufficient', and that this is why we keep both?
I am having trouble keeping tabs on which terms I have checked as I was putting !! on the end of the term names in the xp file but these get removed when I load into OBO-Edit.
Also this email:
Has anybody come up with any cunning schemes for marking which terms they have checked in the cross product file? I am working through bp-self in OE2 just now, checking the graph above the term that has a logical definition in the xp file. I started by putting '!!' on the end of the term name in the xp text file to mark the terms I had checked. I planned then to reload all the files and do a search on 'has is_intersection' AND NOT 'has name ends_with !!'. However, when I load the files, the version of the term names in the live file overwrites the version in the xp file and the !! is lost. If I load the files in the opposite order then it should solve this problem, but then I would have no way of noticing those terms in the live file that have changed their name since my xp file was made (quite a lot of them going by experience so far.) It's hard to mark these even if I mark with dbxrefs in OBO-Edit and then save out and filter to regenerate my xp file, as the dbxrefs to not go in there. Is there any other way to mark a term name that would not be overwritten by the live file version of the term name?
The discussion showed that we could add a synonym on the checked terms in OBO-Edit like this:
synonym: "xp_checked" RELATED XPDONEMARKER 
and then filter them out from the saved file with a perl script. I have writted to Chris to ask if he has a script that I could extend rather than starting from scratch.
10th June 2009
Chris has moved all of the intersection stanzas that involve relationships other than is_a and part_of into the unvetted file. This leaves only the part_of stanzas in the vetted file, and these should be easier to fix. It also means that we no longer need to load the relationship files, so these have been taken out of the import file.
Once I have checked this part_of stanzas they can go into the editor's file. I can also then start looking for terms that could have intersection links but don't yet. A good place to start looking would be the terms that have 'during' or 'involved in' in the name.
Chris explained how to filter the edited files in OBO-Edit's save dialogue so that we can commit the live file and the intersection files. I have documented this at Editors_cross-product_implementation_plan#A_walk-through_of_handling_one_set_of_Cross_products
12th June 2009
Editing session with David Hill and Jennifer Deegan
We found a problem in which cotyledon morphogenesis GO:0048826 was a descendent of both embryonic development GO:0009790 and post-embryonic development GO:0009791, because of the part_of relationships between embryo, seed and fruit. We checked the plant ontology and found that they consider seed to be part of fruit. However, we felt that the seed was really netiher a type of embryonic nor post-embryonic development.
To resolve this we decided to leave the parts of embryo development (e.g. cotyledon morphogenesis) as part_of embryonic development, and have the parts of the fruit that do not include the seed be part_of post embryonic development. Seed development GO:0048316 itself is no longer either part_of embryonic or post-embryonic development.
Deleted fruit development (GO:0010154) from post-embryonic development (GO:0009791) with OBO_REL:is_a Copied fruit septum development (GO:0080127) (as OBO_REL:is_a), to post-embryonic development (GO:0009791)
Ran filtered save and perl script. Committed live file. The intersection file did not need to be committed as no changes were made.
I am working through the file adding fixes. There are many 'during's to be replaced with 'involved in's. At a demo phone call we agreed that checking the relationship to parents was sufficient, and that we do not need to check the entire path to root. I have already done this for all of these terms.
4th November 2009
I have run the reasoner and corrected on cycle that showed up. I tried to look a the file with the implied links tool, but several bugs prevented this.
The tool is not present in OE2, which is the current released version. I can't display my XP file in OE2.1.1 at all. It just shows all nodes as 'classes' in the OTE.
I can display the file in the source version, and the implied links tool works, however, I can't add relationships in this version, and the search results display is mind-bending to look at because of the Mac OS X optimisation to blue and white striped search results. I have written to Amina to explain this problem and to ask if she can either fix these bugs or give me a hint on where I should make changes to fix them. At this point I cannot proceed on this work until the fixes are applied.
I have got OBO-Edit working, by turning off the blue stipes in the source and running from source, and then adding relationships by hand in the text file.
I have started asserting implied links. All the durings are fixed. I have checked my biological_process_xp_self_edit.obo into cvs. I have not yet filtered this file.
I started working through the implied links and made and committed a bunch of fixes. However it turned out that this undid some decisions that had been made about disjointness, so I have had to revert from 1.4 of the edit file to 1.3. I have now done that. I checked the diff between the live file in my xp edit file and the actual live file and this has got rid of all the problems except for the new reproductive terms that I added. These can easily be destroyed if they are not needed as they have not been committed to the live file. I am going to meet with David and Tanya to have a go at working out how to assert implied links without contravening previously made disjoint decisions.
220 of the 315 implied links have been asserted. I am now working on the links that have problems.
I have suggested this change for consideration:
If I change post-embryonic ectodermal gut development from this logical definition: "post-embryonic x development" == "x development" that is part_of post-embryonic development to "post-embryonic x development" == "post-embryonic development" that is part_of "x development" then the OBO-Edit reasoner lets "post-embryonic x development" be a part of "x development" instead of an is_a.
(It was previously decided that "post-embryonic x development" should be a part of "x development" instead of is_a, so this is a desirable outcome.)
I am awaiting discuss with Chris, Tanya and David on the post-embryonic problem above, and am fixing other terms in the meantime.
I have also written to Chris to ask if it is possible to automate the creation of non-cellular siblings in cases like that shown below. (Viral terms should be under non-cellular terms).
After some discussion we decided that this would be quicker to do manually.
Removed all cross product definitions for embryonic and post-embryonic terms. These terms cannot currently be given meaningful cross product definitions. That area of the graph needs a lot of work before we can logically define the terms.