- 1 April: Salt Lake City GOC Meeting
- 1.1 SO Sequence Ontology
- 1.2 Production Systems (Stanford)
- 1.3 Ontology Development
- 1.4 Reference Genome Project
- 1.5 Software and Utilities
- 1.6 User Advocacy
- 1.7 Outreach
April: Salt Lake City GOC Meeting
SO Sequence Ontology
Production Systems (Stanford)
GO term statistics
October 1, 2007
April 16, 2008
SourceForge statistics (Oct. 1 - April 17)
- items opened: 500
- items closed: 476
SourceForge reports (on SF site)
Our most notable accomplishment since the Princeton meeting in September is that the regulates relationships have gone live. Chris, David and Tanya did an enormous amount of work, which is documented in the regulation section of the wiki. A brief summary of metrics is also available.
Other completed work
- The revamp of Sensu terms is now complete. We described our approach of renaming terms and, where necessary, improving definitions or merging terms, at the September meeting.
- We reported on the Cardiovascular physiology/development and Muscle Development content meetings at the September meeting. Changes stemming from those meetings have gone live.
- Smaller-scale efforts include:
- A number of disjointness violations have been corrected.
- Electron transport terms have been reorganized.
- New enzyme-activity function terms and (many!) synonyms added, improving consistency with EC.
- Process and component terms for plasma lipoprotein particles added.
- Sporulation terms have been reorganized, and new terms added (connected with 'sensu' work).
- More new terms have been added for PAMGO.
- PIR GO slim added.
Work in progress
- Two pilot projects to add links between the function and process ontologies are going on. Progress and future directions will be discussed during the meeting.
- A content meeting on lung development was held December 5-6. Progress will be briefly noted during the meeting.
- Jen has started gathering information and identifying experts to work on an overhaul of signal transduction process terms.
Collaboration with IMG
Jane is working with Iain Anderson from IMG. The first set of IMG terms (about 1800) from April 2007 have been mapped and sent back, but since then another approx 1500 terms have been added to IMG and I am currently mapping these. The IMG pathways and parts also require mapping.
I am collaborating with Antonio Jimeno from Dietrich Rebholz-Schuhmann's group (EBI) to create automatic mappings for these terms, which I then manually verify and return the data to him to improve the algorithms. We hope to eventually use this work to create a generic vocabulary mapping tool.
Reference Genome Project
- There are currently 394 genes in the Target gene list. - Selection of genes: Since Nov 2007, we rotate the group selecting target genes. - Curation priorities: Since Nov 2007, targets are not only disease genes anymore. We select 20 genes, 5 in each of 4 categories: (1) disease genes, (2) 'hot genes', (3) metabolic pathways, (4) uncharacterized.
|Organism||# genes looked at||# genes with orthologs||# genes curated|
|Arabidopsis||372||131 (35%)||129 (98%)|
|Caenorhabditis||412||271 (66%)||198 (73%)|
|Gallus||99||82 (82%)||none marked completed|
|Homo||394||393 (99%)||231 (59%)|
|Mus||394||382 (97%)||338 (88%)|
|Saccharomyces||394||164 (41%)||159 (95%)|
|Drosophila||376||171 (45%)||70 (41%)|
|Rattus||394||347 (88%)||250 (72%)|
|Danio||374||283 (75%)||264 (93%)|
|Dictyostelium||331||148 (44%)||53 (36%)|
|Schizosaccharomyces||334||124 (37%)||105 (85%)|
Annotation Quality Control
We are trying to address the issue of quality control of the annotations. Some of the concerns are:
- Omission of annotations
- Errors in annotations
- Absence of 'with' for ISS annotations or 'with' object not experimentally characterized
- Overannotation with ISS to process terms
- Problems in the ontology that can become evident when comparing annotations from different species
Methods to address this:
- There are some queries that can be done: for example, genes for which an ortholog has GO annotations and that is either lacking annotations or annotated to ND
- (Val Wood): Looking for co-occurences of annotations as a high-level way to check for errors
- Manual verification of ortho sets (Source forge tracker: http://sourceforge.net/tracker/?group_id=36855&atid=1040173
Currently the targets genes and annotation status are captured using Google spreadsheets (Target genes and links to every group's annotation status page can be found at http://spreadsheets.google.com/ccc?key=pwOksMOra5uq4vIYjPgefPw
- Ortho set curation status: Siddhartha Basu, Chris Mungall, Seth Carbon and Mary Dolan are working on a database and a tool where target genes (ortho sets) and their curation status will be maintained.
- Graphical displays (Mary Dolan): several improvements
- Integration of ref genomes genes into AmiGO
Generating Ortholog sets
P-POD: Kara Dolinski (Princeton) : Procedure:
- Obtain FASTA files from each group from gp2protein files
- all vs all BLAST
- Ortho MCL
- Output will be rooted trees reconciled with species tress, graphic image of tree
- We are on the Notung step right now and are adding data as they are generated. Currently, OrthoMCL families can be queried by gene name, though you just get back a list of members right now. Data are being made available as soon as we have them:
The reference genome group holds a monthly phone conference. Minutes can be found at Conference_Calls
Software and Utilities
Continues to be run efficiently. The email system was recently moved to Mailman. [will enter stats when I have them]
Two editions since the last meeting. We have applied for an ISSN for the newsletter.
Web-presence Working Group (formerly AmiGO WG)
AmiGO 1.5 was released earlier this month with many new features including a GO slimmer tool, a term enrichment tool and SQL search interface. We are now beginning to set priorities for the next release.
The advocacy group has not been involved with AmiGO development recently, but in the future we have decided that the advocacy group will be involved in setting priorities, from a biologist's perspective, at the beginning of a release and working with the software group to come up with a release plan. The software group will develop the release independently, with advocacy only getting involved again when testing in the run-up to the release is required.
Outreach group activity reduced to supporting groups who approach GO directly. We are not currently actively seeking out new annotation groups.
- Group at CRIBI (Italy) committed to carrying out grape annotation.
- Plant Physiology journal have agreed to accept annotations from submitting authors. (TAIR collaboration)
- TAIR outreach at PAG 2008. Discussion of community annotation with TAIR, SGN (SOL Genomics Network) and WormBase.
- Sol Genomics Network database annotation file has been submitted.
- Reactome have created annotation files according to the plans laid down in Princeton, and are ready to commit when they have cvs access. (Emily Dimmer and Esther Schmidt)
- ISAFG Conference - Fiona McCarthy reports continuing interest in GO.
- Muscle Annotation wiki (Erika Feltrin and Alex Diehl)