GAF Storage

From GO Wiki
Redirect page
Jump to: navigation, search

Redirect to:

All GOC files are stored in CVS

Ontology files are copied to ontology-archive/ - in addition CVS can retrieve arbitrary versions and diffs.

Database builds go to -- however, only derived files go here: MySQL dumps, GO-RDF-XML; not the source files

CVS is causing us problems; see Mike's email:

As you know the size of the UniProtKB gene association file has caused problems for CVS, our users and me.  CVS and CVSWEB just cannot deal with the size of the file.  Users are frustrated because they cannot retrieve 
files from CVSWEB, only from FTP.  That means they are limited to the current version of the file, none of the older versions.  The old process.ontology file has similar CVSWEB problems.  I have lots of problems commiting new versions.

However process.ontology is archived in  Its a monthly archive going back to 2001.

Question, should we do something like this for the UniProtKB file?  Dan maintains an archive at EBI at  Maybe we should just point to the EBI archive.  We could also do that for all the GOA GAFs.

We need to do something.  We need to remove the UniProtKB file, and maybe the old ontology files, from CVS.  Having them on the FTP/HTTP site should be fine.  I can extract the old processed GOA file and put them in an archive directory.  The filtering script would move a copy of the filtered file to the archive and not commit it to CVS.

Are there other options to solve this problem?

The process.ontology file is being retired so this should not be a problem.

One solution would be to publish the source files along with the derived files on -- this has the advantage of keeping everything in sync.

The GAFs would be published with lite and full db releases (but not termdbs)

For consistency we could also copy the version of the ontology used to build the database here too