(Annotation Progress)
(Annotation Progress)
|Mus ||394 ||382 (97%) ||338 (88%)
|Mus ||394 ||382 (97%) ||338 (88%)
|Saccharomyces ||394 ||73 (18%) ||?
|Saccharomyces ||394 ||164 (41%) ||?
|Drosophila ||376 ||171 (45%) ||70 (41%)
|Drosophila ||376 ||171 (45%) ||70 (41%)

April 2008

April: Salt Lake City GOC Meeting

SO Sequence Ontology

Production Systems (Stanford)

Ontology Development


GO term statistics

October 1, 2007

Current Defined Obsolete Total
Function 7879 7492 556 8435
Process 13916 13757 458 14374
Component 2019 2019 114 2133
All 23814 24396 1128 24942

April 16, 2008

Current Defined Obsolete Total
Function 8262 7909 566 8828
Process 14702 14564 470 15172
Component 2077 2077 117 2194
All 25041 25703 1153 26194

SourceForge statistics (Oct. 1 - April 17)

  • items opened: 500
  • items closed: 476

SourceForge reports (on SF site)

Completed work

Regulates relationship

Our most notable accomplishment since the Princeton meeting in September is that the regulates relationships have gone live. Chris, David and Tanya did an enormous amount of work, which is documented in the regulation section of the wiki. A brief summary of metrics is also available.

Other completed work

  • The revamp of Sensu terms is now complete. We described our approach of renaming terms and, where necessary, improving definitions or merging terms, at the September meeting.
  • We reported on the Cardiovascular physiology/development and Muscle Development content meetings at the September meeting. Changes stemming from those meetings have gone live.
  • Smaller-scale efforts include:
    • A number of disjointness violations have been corrected.
    • Electron transport terms have been reorganized.
    • New enzyme-activity function terms and (many!) synonyms added, improving consistency with EC.
    • Process and component terms for plasma lipoprotein particles added.
    • Sporulation terms have been reorganized, and new terms added (connected with 'sensu' work).
    • More new terms have been added for PAMGO.
    • PIR GO slim added.

Work in progress

Collaboration with IMG

Jane is working with Iain Anderson from IMG. The first set of IMG terms (about 1800) from April 2007 have been mapped and sent back, but since then another approx 1500 terms have been added to IMG and I am currently mapping these. The IMG pathways and parts also require mapping.

I am collaborating with Antonio Jimeno from Dietrich Rebholz-Schuhmann's group (EBI) to create automatic mappings for these terms, which I then manually verify and return the data to him to improve the algorithms. We hope to eventually use this work to create a generic vocabulary mapping tool.

Reference Genome Project

Target genes

- There are currently 394 genes in the Target gene list. - Selection of genes: Since Nov 2007, we rotate the group selecting target genes. - Curation priorities: Since Nov 2007, targets are not only disease genes anymore. We select 20 genes, 5 in each of 4 categories: (1) disease genes, (2) 'hot genes', (3) metabolic pathways, (4) uncharacterized.

Annotation Progress

Organism # genes looked at # genes with orthologs # genes curated
Arabidopsis 372 131 (35%) 129 (98%)
Caenorhabditis 412 271 (66%) 198 (73%)
Gallus 99 82 (82%) none marked completed
Homo 394 393 (99%) 231 (59%)
Mus 394 382 (97%) 338 (88%)
Saccharomyces 394 164 (41%) ?
Drosophila 376 171 (45%) 70 (41%)
Rattus 394 347 (88%) 250 (72%)
Danio 374 283 (75%) 264 (93%)
Dictyostelium 331 148 (44%) 53 (36%)
Schizosaccharomyces 334 124 (37%) 105 (85%)
Escherichia 375 51 (13%)

2008-04-RefGenomeMetric-all data.jpg

Annotation Quality Control

We are trying to address the issue of quality control of the annotations. Some of the concerns are:

  • Omission of annotations
  • Errors in annotations
  • Absence of 'with' for ISS annotations or 'with' object not experimentally characterized
  • Overannotation with ISS to process terms
  • Problems in the ontology that can become evident when comparing annotations from different species

Methods to address this:

  1. There are some queries that can be done: for example, genes for which an ortholog has GO annotations and that is either lacking annotations or annotated to ND
  2. (Val Wood): Looking for co-occurences of annotations as a high-level way to check for errors
  3. Manual verification of ortho sets (Source forge tracker: http://sourceforge.net/tracker/?group_id=36855&atid=1040173

Software development

Currently the targets genes and annotation status are captured using Google spreadsheets (Target genes and links to every group's annotation status page can be found at http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw

  1. Ortho set curation status: Siddhartha Basu, Chris Mungall, Seth Carbon and Mary Dolan are working on a database and a tool where target genes (ortho sets) and their curation status will be maintained.
  2. Graphical displays (Mary Dolan): several improvements
  3. Integration of ref genomes genes into AmiGO

Generating Ortholog sets

P-POD: Kara Dolinski (Princeton) : Procedure:

  1. Obtain FASTA files from each group from gp2protein files
  2. all vs all BLAST
  3. Ortho MCL
  4. ClustalW
  • Notung
  • Output will be rooted trees reconciled with species tress, graphic image of tree
  • We are on the Notung step right now and are adding data as they are generated. Currently, OrthoMCL families can be queried by gene name, though you just get back a list of members right now. Data are being made available as soon as we have them:

Web interface: http://ppod.princeton.edu/cgi-bin/ppod.cgi FTP site: ftp://gen-ftp.princeton.edu/ppod/go_ref_genome/


The reference genome group holds a monthly phone conference. Minutes can be found at Conference_Calls

Software and Utilities

User Advocacy

GO helpdesk

Continues to be run efficiently. The email system was recently moved to Mailman. [will enter stats when I have them]

GO newsletter

Two editions since the last meeting. We have applied for an ISSN for the newsletter.

Web-presence Working Group (formerly AmiGO WG)

AmiGO 1.5 was released earlier this month with many new features including a GO slimmer tool, a term enrichment tool and SQL search interface. We are now beginning to set priorities for the next release.
The advocacy group has not been involved with AmiGO development recently, but in the future we have decided that the advocacy group will be involved in setting priorities, from a biologist's perspective, at the beginning of a release and working with the software group to come up with a release plan. The software group will develop the release independently, with advocacy only getting involved again when testing in the run-up to the release is required.


Outreach group activity reduced to supporting groups who approach GO directly. We are not currently actively seeking out new annotation groups.

Major Developments:

  • Group at CRIBI (Italy) committed to carrying out grape annotation.
  • Plant Physiology journal have agreed to accept annotations from submitting authors. (TAIR collaboration)

[online submission tool]

  • TAIR outreach at PAG 2008. Discussion of community annotation with TAIR, SGN (SOL Genomics Network) and WormBase.
  • Sol Genomics Network database annotation file has been submitted.
  • Reactome have created annotation files according to the plans laid down in Princeton, and are ready to commit when they have cvs access. (Emily Dimmer and Esther Schmidt)
  • ISAFG Conference - Fiona McCarthy reports continuing interest in GO.
  • Muscle Annotation wiki (Erika Feltrin and Alex Diehl)