Editor Guide

From GO Public

Jump to: navigation, search

This page includes some handy hints for ontology editors.


Contents

[edit] What has changed in the file?

One way to find out what has changed between two different version of a file in cvs is to use the 'cvs diff' command with different version numbers.

'cvs diff -r 5.249 -r 5.250 go/ontology/gene_ontology_edit.obo'

This give the normal diff output but between the two versions specified in the command.


[edit] Checks to do Before Commit

  • is_a complete check
  • namespace check
  • double spaces
  • newlines
  • saved with a released version of OBO-Edit?
  • cvs diff
  • term in one ontology with an is_a or part_of parent in a different ontology


[edit] Using OBOMerge

A bit on terminology:

1. parent_file = The file that you checked out and began to make your changes on.

2. live_file = The current live version of the file.

3. your_branch = The file that contains all of your changes that was based on the parent_file.


  • Checkout the version of the ontology that you branched from, this will be your parent file.
 >cvs co –r <version> -p gene_ontology_edit.obo > parent_file


  • Go to the OBOEdit directory on your machine.
 >cd OBOEDIT directory (in Windows, use ‘dir’)
 >cp oboedit.vmoptions obomerge.vmoptions (if Windows, use ‘copy’)
 >obomerge (just the command alone gives a list of all the options)
 >obomerge –fail-on-clash IF_LIKELY parent_file live_file your_branch –o merged_file

Note that the order of the file names matters very much. The second file’s contents will take precedence over the third one’s contents, should there be a conflict.

Expected results:

 Parse done!
 Parse done!
 Parse done!
 lots of other stuff, that unless it talks about ID clashes, is apparently safely ignored


  • If there are ID conflicts, the script will report them and die. Resolve the id conflicts in YOUR file (not the live one) by replacing the offending ids with some in the free range (check the current go_numbers file to see which are free in your number range). Do search and replace very, very carefully, or write to Jen and ask for her script.
  • Merged_file will be in obo1.0 format.
  • Load merged file into oboedit and save in 1.2 format.
  • (If saved in OBOEdit windows version and then ported over to Solaris, use dos2unix to convert to unix line endings.)
  • Diff against local copy of gene_ontology_edit.obo
 >diff merged_file gene_ontology_edit.obo
  • Make sure only YOUR changes are in the diff.
  • Rename merged file as the live filename.
 >mv merged_file gene_ontology_edit.obo
  • Diff again against repository version.
 >cvs diff gene_ontology_edit.obo
  • If you get an up-to-date conflict, update the gene_ontology_edit.obo file to the live version and start again.
  • If no changes except your own, commit with all the appropriate comments.
 >cvs ci gene_ontology_edit.obo
  • Have a look at the log message and check if the number of added (e.g.+45) and deleted (e.g.-55) lines are right or not. If the numbers do not convince you, do a cvs diff between the live file and the previous version and check if all changes are correct.
 >cvs diff -r 1.15 -r 1.14 go/ontology/gene_ontology_edit.obo
  • Have a pint (or two).

[edit] Using obo2obo

If you are using obo2obo it is worth looking at the OBO-Edit help guide as the documentation in there is much clearer than that command line documentation.


[edit] Windows line endings

To get rid of windows line endings use:


tr -d '\r' < oldfile > newfile


[edit] Using cvs from Windows

If you want to edit using the windows operating system you can use TortoiseCVS (http://www.tortoisecvs.org/). Jen is using this and can help with setup.

Here is an example of the settings that you will need in TortoiseCVS:

Image:TortoiseSettings.PNG

You will also need to have PuTTY and Pageant set up, and when you are issuing cvs commands you will need to have pageant open and the ssh key loaded.
In order to carry out a cvs diff command you will need to install a programme that can do the diff operation. One good example is winmerge which you can get from http://winmerge.org/.
To set it up to work with TortoiseCVS follow this screenshot:

Image:diff.PNG

Use of TortoiseCVS is quite intuitive. It works from within the file explorer window just by right clicking any file as follows.

Image:use.PNG

Before commit you must save the file with unix line endings using the windows installation of emacs. (Info: http://www.gnu.org/software/emacs/ Download: http://ftp.gnu.org/pub/gnu/emacs/windows/).

It takes at least ten minutes for each cvs command to complete so you need to be very patient.


[edit] Checking history of a term

If you want to know what has happened to a term through many cvs commits you can use the script in go/software/utilities called cvs_diff_history.pl.

The script runs diffs between adjacent versions of files for as many rounds back in time as you require, and searches them for any word that you provide. This useful, for example, if you want to check if a term has been lost in a dodgy commit sometime ago, or if you just want a list of all the GO terms altered in the last 30 commits.

The script is here: http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/software/utilities/cvs_diff_history.pl


[edit] Checking the e-mail archive

There may have been discussions on a given term in the GO e-mail list. An archive of the list searched to find pertinent e-mails. You can use the search facility via the website for a text search. The URL for the email archive is of the form below, here an example of the go archive. The search form, if available as not all lists have this search feature, is at the top of the page.

 http://fafner.stanford.edu/mailman/listinfo/go 

The messages for each month can be downloaded from the following page, again with gohelp as the example.

 http://fafner.stanford.edu/pipermail/go 

If you are not sure of the lists name or to see which lists are available use this URL.

 http://fafner.stanford.edu/mailman/listinfo

[edit] Bulk changes to the file

If you are using the script swap.pl in go/software/utilities to make bulk changes to a the names of a lot of terms, it is useful to know that you only need to change the name text in the term name and not in the relationship lines of the stanza. If you change the term names and then load the file into OBO-Edit it will automatically change the name strings in the relationship lines.

[edit] Relationships

Editor Guide to Regulates
Editor Guide to has_part


[edit] Content Meetings

Content Meeting Participants Information

Personal tools