SWUG:Quality Control

From GO Wiki
Jump to: navigation, search

All software (libraries, scripts, applications, utilities) developed within the GO consortium should follow these principles

Automated Testing

General principles:

  • all GO software must have an extensive test suite
  • all cvs/svn commits must pass the test suite
  • new capabilities must be accompanied by tests
  • all releases absolutely must pass every test

Perl APIs

Both go-perl and go-db-perl have extensive test suites in the standard perl style. To run them:

cd go-perl
perl Makefile.PL
make test

Note that go-db-perl test suite requires presence of a writeable database; if not present the suite will automatically pass. This facilitates simple CPAN installs.

Java APIs

The org.obo API underpinning OBO-Edit has several JUnit test suites. There is a JUnit master test that runs them all

We need to do more to ensure the tests are run regularly. See: this tracker item

  • TODO


Build pipeline

This is somewhat ad-hoc. go-db-perl tests isolated parts of database building, but not the whole pipeline.

There is a mini-pipeline test, which tests a full build with the first 10k lines from every gene-association file:

cd go-dev/database/test

AFAIK this is not run regulary.

  • TODO

Database checks

Whilst there is a test suite of the pipeline software, we have relatively little in the way of checking the contents of the built database. There may be content errors even if the build software is perfect (perhaps due to an upstream content or processing error)

The go-prepare-release script does a minimal amount of checking as it progresses, using GO::Admin->guess_release_type. However, this check is not strong enough to halt the release - instead failures are emailed to the central admin email account (is this checked?)

We need more checks. Failure of these checks should halt the release and force manual intervention. The checks would be executed via go-prepare-release. They could be implemented in perl, SQL views, or a mixture of both

The checks include, but are not limited to:

  • Foreign Key Integrity

The new bulkloading script introduced database integrity errors.

These could be avoided altogether if we use InnoDB rather than MyISAM

TODO: determine feasibility

If we can't use InnoDB, then we can generate SQL Views that check for integrity. Presumably there is a way to generate these automatically from the source schema.

  • Content checks

We can assign minimum numbers for various parameters and automatically check

  1. > 20k terms
  2. > n associations per reference genome
  3. > n sequences per reference genome

Suggested tests

Terms ?

Gene Products all gene products MUST have the following attributes:

  • a valid type (gene_product.type_id=term.id)
  • valid pecies (gene_product.species_id=species.id)
  • valid dbxref (gene_product.dbxref_id=dbxref.id)

and the values of these attributes must not be null

Associations All associations MUST have the following attributes:

  • a valid term (association.term_id=term.id, additional check that term is valid?)
  • valid gene product (association.gene_product_id=gene_product.id)
  • valid database (association.source_db_id=db.id)

Derived files and release pipeline

There are a number of files checked into GO CVS which are derived. AFAIK there is no automated tests for these. Automated tests are difficult since the scripts by nature modify the publicly available CVS.

Some of these scripts use obo2obo. TODO - all such pipeline calls must be accompanied by a JUnit test in OBOEdit. E.g. conversion from obof1.2 to obof1.0.


There is currently no automated testing for the UI. This is quite difficult to do.


Web-apps are slightly easier to automatically test than standalone apps. We still need manual testing to see if everything looks OK etc. However, we can still do link checking and flow-of-control checking.

  • TODO - Seth, fill in hammer details here.

For example, for any linkouts to AmiGO, we should add these to the test suite and make sure they return valid html that contains the relevant information

Manual Testing

Automated testing form the first line of defense. Manual testing is still required, particularly for end-user apps.

Manual and automated testing can be intertwined. E.g. A link checker can pre-load a bunch of URLs into a single web page making it much easier for testers to look them over.

A discussion of what to test for and how can be found at

AmiGO : manual tests

OBO-Edit : manual tests

  • TODO link

Software Lifecycle


Software Releases

  • TODO - write up. Feature freezes. Beta releases. Version numbers.


  • Perl modules should have POD documentation -- this will show up when released on CPAN for example; e.g Graph module
  • Java should have javadoc & this should be auto-published

Code should obviously be commented.

Scripts should show USAGE info when called with no args or -h