Biological process xp self ProgressNotesNov2008
In a meeting today at the Editorial Office we figured out who will be responsible for which cross products file. I am responsible for the 'biological_process_xp_self' file.
I am to read through the file and check the biological content of the intersection tags. Any edits are to be made by hand in the file, but OBO-Edit2 is currently able to be used as a browser. Where specific domain knowledge is needed I can send list of terms to people who have the correct expertise.
The hope is that these cross products should be in the editors file by the beginning of January and in the public domain by the end of March.
I have started trying to work with the files. I loaded up the files list in my import file and made the following notes about what happened.
When I load the files I get two orphans at the top. They are
ID = obol:culmination and
ID = interphase_by_interphase_microtubule_organizing_center
Verification manager warnings on load:
culmination during sorocarp development (GO:0031154) generated 2 warnings:
The term culmination during sorocarp development (GO:0031154) links to the dangling identifier obol:culmination The cross product definition of culmination during sorocarp development (GO:0031154) refers to a dangling parent /obol:culmination\. derived_into (OBO_REL:derived_into) generated 1 warning: The term derived_into (OBO_REL:derived_into) links to the secondary identifier OBO_REL:derives_from has_improper_part (OBO_REL:has_improper_part) generated 1 warning: The term has_improper_part (OBO_REL:has_improper_part) links to the obsolete term improper_part_of (OBO_REL:improper_part_of) improper_part_of (OBO_REL:improper_part_of) generated 1 warning: The term improper_part_of (OBO_REL:improper_part_of) links to the obsolete term has_improper_part (OBO_REL:has_improper_part) interphase microtubule nucleation by interphase microtubule organizing center (GO:0051415) generated 2 warnings: The term interphase microtubule nucleation by interphase microtubule organizing center (GO:0051415) links to the dangling identifier interphase_by_interphase_microtubule_organizing_center The cross product definition of interphase microtubule nucleation by interphase microtubule organizing center (GO:0051415) refers to a dangling parent /interphase_by_interphase_microtubule_organizing_center\.
If I do a link search for anything that has is intersection then the last column of the results is very wide and cannot be made smaller.
Link Search usage:
Selecting things in the link search results panel does not result in them being shown in the OTE or in the Graph viewer. The OTE moves to a new place but there is no way of knowing which term I should be looking at.
The tab on the text editor has its name text much smaller than the text on all the other component tabs.
OBO-Edit ran out of memory and crashed even with the reasoner off on my mac.
Components: 1 OTE, Graphviz Viewer, Graph viewer, text editor, link search + one results panel. The cause of the crash was the graph viewer trying to display a term.
This also happens if I load the files and have a link search + results, a term search (unused), a graph viewer and a parent editor.
Search doesn't remember config settings over restart
These issues have all been added to the tracker.
Processing XP files to be checked mostly by eye
I have written two scripts to help with checking the XP composition. Currently all we have are term names to go on, and I think that having the defs available alongside term names will save time. The scripts are:
The first changes the GO live file to tab-delimited format:
GO:id \t definition
The second takes that file and the XP file with this format:
[Term] id: GO:0000022 ! mitotic spindle elongation intersection_of: GO:0051231 ! spindle elongation intersection_of: part_of GO:0007067 ! mitosis [Term] id: GO:0000070 ! mitotic sister chromatid segregation intersection_of: GO:0007059 ! chromosome segregation intersection_of: part_of GO:0007067 ! mitosis
and where a term id is quoted it adds in the def of the term underneath as follows:
[Term] id: GO:0000022 ! mitotic spindle elongation def: "Lengthening of the distance between poles of the mitotic spindle." [GOC:mah] intersection_of: GO:0051231 ! spindle elongation def: "The cell cycle process whereby the distance is lengthened between poles of the spindle." [GOC:ai] intersection_of: part_of GO:0007067 ! mitosis def: "Progression through mitosis, the division of the eukaryotic cell nucleus to produce two daughter nuclei that, usually, contain the identical chromosome complement to their mother." [GOC:ma, ISBN:0198547684]
I have checked the copy of my file with defs into scratch/xps and I will delete terms from that file as they are checked and found to be fine. The file is called biological_process_xp_self_with_defs.obo.
I have read right through the file and picked out the terms that need changes. These terms are now the only ones in the file xps/biological_process_xp_self_with_defs.obo. I have also made notes on the things that I noticed and on the specific changes needed in the file xps/biological_process_xp_self_notes.txt.
Next I will start making the changes that are needed and will report back to Chris the things that I found.
Chris says that when OBO-Edit displays a new relationship type as is_a then we should in fact include the typedef in the relationship file. He is going to arrange that.
I have been looking into the problem of disjoints in the Tree Viewer, and there is a proposal to have a single disjoint relationship between pairs of terms, as the relationship is symmetrical anyway. There is opposition to this proposal, and requests that OBO-Edit be changed to handle the pairs of relationships.
I discovered that the Graph Viewer is loading typedefs from the live file, while the OTE is loading typedefs from other files. I am looking into how relationships are handles generally so as to figure out what needs to be done.
I have tried loading my files and running the reasoner. The reasoner runs in a minute or so, but fatal errors are produced:
M phase of meiotic cell cycle (GO:0051327) generated 1 error: M phase of meiotic cell cycle (GO:0051327) is part of a cycle over the property part_of. M phase of mitotic cell cycle (GO:0000087) generated 1 error: M phase of mitotic cell cycle (GO:0000087) is part of a cycle over the property part_of. maintenance of turgor in appressorium by melanization (GO:0075043) generated 1 error: maintenance of turgor in appressorium by melanization (GO:0075043) has disjoint superclasses multi-organism process and cellular process meiosis (GO:0007126) generated 1 error: meiosis (GO:0007126) is part of a cycle over the property part_of. mitosis (GO:0007067) generated 1 error: mitosis (GO:0007067) is part of a cycle over the property part_of. smooth muscle relaxation of the bladder outlet (GO:0060085) generated 1 error: smooth muscle relaxation of the bladder outlet (GO:0060085) is part of a cycle over the property is_a (OBO_REL:is_a). synaptic transmission involved in micturition (GO:0060084) generated 1 error: synaptic transmission involved in micturition (GO:0060084) is part of a cycle over the property is_a (OBO_REL:is_a).
12th December meeting
We had a meeting to discuss cross products and we all gave progress updates. We discussed the issue of disjoint symmetry and the idea of only putting one disjoint relationship between pairs of terms. I have committed to test all the OBO-Edit components to see how many are broken by this and how many are fixed. I do not completely understand the situation with the OTE so need to remember to mention that this is not fully tested.
We discussed the idea of saving from OE2 to gene_ontology_write.obo, and I am to write the first draft of the proposal.
1st April 2009
I have commented out all of the problematic terms in the self file and written to Chris to ask if he has a script to move them to the new unvetted file that I have made. David and I are planning to start looking at the paths to root on the vetted terms soon.
3rd April 2009
Participants: David Hill, Jen Deegan
We started working through the unvetted file and making changes. We just changed the live GO graph and made notes on how the rules in obol might be changed, on the understanding the Chris will reparse to improve the intersection tags. We have not edited the intersection tags at all. I made notes on our changes in the unvetted file and each note starts '!!'
I will write to Chris to tell him what we have done.
Later: Chris wrote back to say that we should also update the intersection file by hand in this file as it is not sensible in this case to regenerate the intersection links by obol.
9th April 2009
I have loaded up the intersection file with the current live GO and am checking the ancestors of the logically defined terms to see that they make sense. Previously I only checked that the names and defs of the defined terms and intesection terms made sense.
1) Term GO:0031180 to GO:0031200 have been merged into other terms so I will delete these terms from the intersection file. Same for GO:0031178
I have sent two questions to the ontology developers list:
I have just been looking at the ancestors of my cross product terms and I found a weird thing that could maybe be tested for systematically in the other stanzas. In the Graphviz view below you can see that there is a cycle between 'mitosis', 'mitotic cell cycle' and 'M phase of mitotic cell cycle'. The cycle is made up of intersection tags and normal relationships, but it still wonky. Do you think such cycles could be checked by a script or something? I don't think we can currently do it in OBO-Edit.
If I find a graph, like the one below, where the intersection tags show exactly the same as the existing relationships, is it right that I leave both sets in? Would I be right in thinking that the normal relationships show what is 'necessary' for the child term, where the intersectionss show what is 'necessary and sufficient', and that this is why we keep both?
I am having trouble keeping tabs on which terms I have checked as I was putting !! on the end of the term names in the xp file but these get removed when I load into OBO-Edit.
Also this email:
Has anybody come up with any cunning schemes for marking which terms they have checked in the cross product file? I am working through bp-self in OE2 just now, checking the graph above the term that has a logical definition in the xp file. I started by putting '!!' on the end of the term name in the xp text file to mark the terms I had checked. I planned then to reload all the files and do a search on 'has is_intersection' AND NOT 'has name ends_with !!'. However, when I load the files, the version of the term names in the live file overwrites the version in the xp file and the !! is lost. If I load the files in the opposite order then it should solve this problem, but then I would have no way of noticing those terms in the live file that have changed their name since my xp file was made (quite a lot of them going by experience so far.) It's hard to mark these even if I mark with dbxrefs in OBO-Edit and then save out and filter to regenerate my xp file, as the dbxrefs to not go in there. Is there any other way to mark a term name that would not be overwritten by the live file version of the term name?
The discussion showed that we could add a synonym on the checked terms in OBO-Edit like this:
synonym: "xp_checked" RELATED XPDONEMARKER 
and then filter them out from the saved file with a perl script. I have writted to Chris to ask if he has a script that I could extend rather than starting from scratch.
11th June 2009
Chris has moved all of the intersection stanzas that involve relationships other than is_a and part_of into the unvetted file. This leaves only the part_of stanzas in the vetted file, and these should be easier to fix. It also means that we no longer need to load the relationship files, so these have been taken out of the import file.
Once I have checked this part_of stanzas they can go into the editor's file. I can also then start looking for term that could have intersection links but don't yet. A good place to start looking would be the terms that have 'during' or 'involved in' in the name.