Migration of GO CVS to SVN (detailed)

From GO Wiki
Jump to: navigation, search

The GO CVS repository should be moved to SVN. The directory structure would be the same. In the SVN repo the existing GO structure would be kept under a dir "trunk"

See Migration_of_GO_CVS_to_SVN

Advantages

  • Simple zero-cost tagging of releases
  • Use of svn externals (useful for GO ontology management, external ontologies)
  • Directories can be moved and deleted (not possible in CVS)
  • More efficient for storing large files, eg GAFs?

One the main advantages is the ability to better manage release archives.

Using SVN we can have a releases directory that is indexed by date; there is no additional storage cost. E.g

 releases/
   2011-10-24/
     ontology/
       go.obo
       go.owl
     annotation/
       ...

This will be particularly useful for Ontology_Release_Files_Proposal

Counter-proposals

  • git - too complex?
  • bazaar - like git, but uses a superset of svn commands

Cost

  • Time-consuming to migrate?
    • we want to preserve cvs history
    • cvs2svn migration tools - test
  • People who use CVS must now use new tools
    • Should be an easy transition - most of the commands remain the same

Plan (Overview)

Proposed 2011-11-16

(time estimates to be filled in)


  1. Set up an empty SVN repo on geneontology.org [S] DONE
    1. svn co svn+ssh://ext.geneontology.org/share/go/svn/trunk go-trunk
  2. Set up http://viewvc.org/ [S] DONE - http://viewvc.geneontology.org/viewvc
  3. Test with selected directories [B]
    1. Start with: http://wiki.geneontology.org/index.php/Ontology_Release_Files_Proposal - DONE
    2. We have this ready to go and can turn this around quickly
  4. Selectively expose a subset of directories on http://geneontology.org/ [S]
    1. Start with the ontology release dir above - DONE
    2. alternatively: temporarily serve this off a different URL, e.g. test.geneontology.org
  5. backups in place [S]
  6. switch [S+B]
    1. Freeze cvs edits
    2. copy cvs structure to svn (no history)
    3. switch links in html pages, scripts that point to cvs
    4. turn off cvs write access [or: keep cvs as a mirror?]
    5. make http://geneontology.org point to svn
  7. Use svn mv to refactor dir structure as we see fit on a case by case basis

First steps (detail)

On the software call (2011-11-18) we discussed management of the GAFs in VC.

  • We agreed that VC is not a perfect solution for data, but it's the best one we have right now
  • We thought it would be better to manage GAFs uncompressed in SVN
    • allows diffs
    • More space-efficienct
    • Faster? (need to check)
    • Can use viewcv and svn tools to look at how GAFs have changed
    • the above hold especially true for managing GPAD/GPI
  • However, we need to do tests to make sure this is feasible
    • Justin (PO) reported that they had been using uncompressed GAFs in SVN (up to 800k lines per gaf) with no issues

we decided on the following plan:

  • We will set up an svn repo +viewcv in the next couple of weeks (steps 1-2 above). This will initially be a sandbox for the ontology exports
  • In parallel we will run a number of tests simulating some of the existing history of GAFs in GO as if we had been using uncompressed files in SVN
  • Assuming tests are successful, report back to GOC with proposed time of switchover

Migration of ontology directory (March 15 2012)

Migrating the ontology directory is of high priority for the ontology group. We are making use of features such as svn:externals, and many of the extension files have already migrated.

As of 12pm PST March 15 2012:

  • There will be no more user commits to the ontology/ directory in CVS
  • Ontology editors will make commits to svn
  • The editors file on svn will be committed to CVS on a nightly basis (cron job run at berkeley)
  • Automated scripts that commit to the CVS directory will continue to do so, and by migrated on a case-by-case basis.

Note that only the ontology directory will be affected at this time. If you commit to other directories, this does not affect you.

GOC members and external users of the GO will be unaffected by this change, because the contents of the ontology directory will be mirrored in CVS. Everything will look as normal to external users.

We expect an ontology editing / TermGenie freeze of no more than two hours during this process.

Specific details

  • UK ontology editors will be ensure all edits are committed before 7.30pm (UK time) on March 14
  • US ontology editors will ensure all edits are committed before 11.30am
  • TermGenie gatekeeper will clear queue before 11am, then TG will be offline until after migration
  • No more edits until after software group announces miration has happened
  • A cron job will run at Berkeley in Jenkins that will
    • svn update ontology
    • copy editors/gene_ontology_write.obo and go_xp_chebi to cvs working dir
    • commit to cvs
  • The agent gocvs will continue to run at SGD on the cvs directory, which will continue to perform checks and generate downstream files in CVS. This will be migrated later on.

Status

Migration of the ontology directory is complete.

For instructions on how to use GO svn, see:

It is recommended you use a different name such as go-trunk so as not to introduce confusion with the checked out cvs repository. E.g

 svn co svn+ssh://ext.geneontology.org/share/go/svn/trunk go-trunk
 cd go-trunk/ontology/

You may have to type

 svn co svn+ssh://go_user@ext.geneontology.org/share/go/svn/trunk go-trunk
 cd go-trunk/ontology/

Replacing go-user with your username

Note that a jenkins job ensures that commits to svn are copied to cvs, and that there are no accidental commits to cvs

The cvs log is here:

The svn log is here:

Jenkins Jobs