Editor Guide

From GO Public

Jump to: navigation, search

This page includes some handy hints for ontology editors.

Contents

What has changed in the file?

One way to find out what has changed between two different version of a file in cvs is to use the 'cvs diff' command with different version numbers.

'cvs diff -r 5.249 -r 5.250 go/ontology/editors/gene_ontology_write.obo'

This give the normal diff output between the two versions specified in the command.


Checks to do before commit

  • is_a complete check
  • namespace check
  • double spaces
  • newlines
  • saved with a released version of OBO-Edit?
  • cvs diff
  • term in one ontology with an is_a parent in a different ontology


Using OBOMerge

A bit on terminology:

1. parent_file = The file that you checked out and began to make your changes on.

2. live_file = The current live version of the file.

3. your_branch = The file that contains all of your changes that was based on the parent_file.

  • Check out the version of the ontology that you branched from. This will be your parent file.
 >cvs co –r <version> -p gene_ontology_write.obo > parent_file
  • Go to the OBO-Edit directory on your machine.
 >cd OBOEDIT directory (in Windows, use ‘dir’)
 >cp oboedit.vmoptions obomerge.vmoptions (if Windows, use ‘copy’)
 >obomerge (just the command alone gives a list of all the options)
 >obomerge –fail-on-clash IF_LIKELY -version OBO_1_2 -original parent_file -revision live_file -revision your_branch –o merged_file

Note that the order of the file names matters very much. The second file’s contents will take precedence over the third one’s contents, should there be a conflict.

Expected results:

 Parse done!
 Parse done!
 Parse done!
 lots of other stuff that, unless it talks about ID clashes, is apparently safely ignored
  • If there are ID conflicts, the script will report them and die. Resolve the id conflicts in YOUR file (not the live one) by replacing the offending ids with available ids from your range (check the current go_numbers file to see which are free in your number range). Do search and replace very, very carefully, or write to Jen and ask for her script.
  • If you saved in OBO-Edit in Windows and then ported over to Solaris, use dos2unix to convert to unix line endings. (If this is gibberish to you, it probably doesn't affect you.)
  • Diff against local copy of gene_ontology_write.obo
 >diff merged_file gene_ontology_write.obo
  • Make sure only YOUR changes are in the diff.
  • Rename merged file as the live filename.
 >mv merged_file gene_ontology_write.obo
  • Diff again against repository version.
 >cvs diff gene_ontology_write.obo
  • If you get an up-to-date conflict, update the gene_ontology_write.obo file to the live version and start again.
  • If there are no changes except your own, commit with all the appropriate comments.
 >cvs ci gene_ontology_write.obo
  • Have a look at the log message and check if the number of added (e.g.+45) and deleted (e.g.-55) lines are right or not. If the numbers do not convince you, do a cvs diff between the live file and the previous version and check if all changes are correct.
 >cvs diff -r 1.15 -r 1.14 go/ontology/gene_ontology_write.obo
  • Have a pint (or two).

Using obo2obo

If you are using obo2obo it is worth looking at the OBO-Edit help guide as the documentation there is much clearer than the command line documentation.


Windows line endings

To get rid of Windows line endings use:

tr -d '\r' < oldfile > newfile

To detect Windows line endings when using a Windows machine

You can detect whether a file has Windows line endings when using a Windows machine by opening the file in any hex editor (e.g. XV132) and seeing whether the lines end in 0x0A (UNIX) or 0x0D 0x0A (DOS).

You can convert the line endings using Notepad++. The command is [Format]->[Convert to UNIX format]. If you have to commmit a file that only has the line endings changed from a windows machine then just do cvs commit, not update and then commit. If you update first then no changes will be apparent and there will be no commit option available.


Using cvs from Windows

If you want to edit using the windows operating system you can use TortoiseCVS (http://www.tortoisecvs.org/). Jen is using this and can help with setup.

Here is an example of the settings that you will need in TortoiseCVS:

Image:TortoiseSettings.PNG

You will also need to have PuTTY and Pageant set up, and when you are issuing cvs commands you will need to have Pageant open and the ssh key loaded.
In order to carry out a cvs diff command you will need to install a programme that can do the diff operation. One good example is winmerge, which you can get from http://winmerge.org/.
To set it up to work with TortoiseCVS follow this screenshot:

Image:diff.PNG

Use of TortoiseCVS is quite intuitive. It works from within the file explorer window just by right clicking any file as follows.

Image:use.PNG

Before commit you must save the file with unix line endings using the windows installation of emacs. (Info: http://www.gnu.org/software/emacs/ Download: http://ftp.gnu.org/pub/gnu/emacs/windows/).

It takes at least ten minutes for each cvs command to complete so you need to be very patient.


Checking history of a term

If you want to know what has happened to a term through many cvs commits you can use the script in go/software/utilities called cvs_diff_history.pl.

The script runs diffs between adjacent versions of files for as many rounds back in time as you require, and searches them for any word that you provide. This useful, for example, if you want to check if a term has been lost in a dodgy commit some time ago, or if you just want a list of all the GO terms altered in the last 30 commits.

The script is here: http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/software/utilities/cvs_diff_history.pl


Checking the e-mail archive

There may have been discussions on a given term on one of the GO email lists. Each list has an archive that can be browsed, and some can be searched, to find pertinent e-mails. The search form (if available; not all lists have the search feature) is at the top of the listinfo page.

Listinfo page URL syntax:

  http://fafner.stanford.edu/mailman/listinfo/[listname]

Example:

 http://fafner.stanford.edu/mailman/listinfo/go

List archive page URL syntax:

  http://fafner.stanford.edu/pipermail/[listname]

Example:

 http://fafner.stanford.edu/pipermail/go

List of all Mailman lists (useful if you are not sure of the list's name, or to see which lists are available):

 http://fafner.stanford.edu/mailman/listinfo

Bulk changes to the file

If you are using the script swap.pl in go/software/utilities to make bulk changes to the names of a lot of terms, it is useful to know that you only need to change the name text in the term name and not in the relationship lines of the stanza. If you change the term names and then load the file into OBO-Edit it will automatically change the name strings in the relationship lines.

Relationships

Editor Guide to Regulates

Editor Guide to has_part

Content meetings

Content Meeting Participants Information

Adding references to Wikipedia

As Chris has added Wikipedia to GO.xrf_abbs, we no longer have to put URLs in definition dbxrefs if we want to cite Wikipedia as a def source. Instead, citations should now be in the form 'Wikipedia:Page_name'.

In OBO-Edit, put 'Wikipedia' in the 'Database' field and the page name, as it appears at the end of the URL, in the 'ID' field.

Obsolete alert emails

When you have to make a term obsolete, first send an email to the GO list to give notice and allow a period for objections, questions, comments, etc.

We give two weeks notice if the term is used in annotations, or one week if there are no annotations (mappings in any of the external2go files don't count as annotations for this purpose).

The subject line must start with the word Alert:; it can mention the name and/or GO ID of the doomed term, but doesn't have to (especially if you're alerting about more than one term in a single email).

The body of the email should include

  • The term name(s) and ID(s)
  • Counts of any annotations, by group, and distinguishing IEA from the rest
    • Over the years, two or three different format/syntax patterns have cropped up. Any of them is fine; see the example emails linked below.
  • The reason(s) for making the term(s) obsolete
  • Recommended terms to transfer annotations and mappings (the consider and replaced_by tags)
  • Any external2go mappings that use the term(s)
  • A link to the relevant SourceForge entry
  • The deadline for comments, and a reminder that if we hear nothing, the obsoletion goes ahead

Links to examples in the GO mailing list archive:

Mailing list sign-up page

http://fafner.stanford.edu/mailman/listinfo

Personal tools