SLC GO Consortium Meeting Minutes April 2008
GO Consortium Meeting Minutes
ACTION ITEM: GO Top needs to sign off on agenda before the GOC meeting & Ref Genome meetings prior to these meetings. If there is no action item stated, this is not the forum for the item.
1. Introduction & Review of Agenda
2. GO Reference Genome Annotation Team Report (Pascal)
394 genes in target list – from disease genes, hot genes, metabolic pathways, unannotated gene lists.
2 methods being investigated for ortholog determination: PPOD (Kara) & tree-based (Paul Thomas, SRI). Most groups have looked at all the target genes and determined orthologs for their species.
Ref Genome Process:
1. generate focal sets – have been doing manually but will move to using Kara & Paul’s data.
2. select common curation targets
New curation priority is to annotate the families generated by PPOD where there is at least 1 member from each Ref Gen species – there are 150 of these.
Rex: focus on the clusters that have 1 in each species (rather than multiples) as these will be the least confusing.
3. do experimental annotations – notification once completed
4. do inferential (ISS) annotations
=> at each step generate error reports
QC issues – need QC at each step in the process. For GO annotation, look at outliers, check co-occurrences of annotations. For orthologs, manual verification of ortho sets.
Rex: will monitor to see if we are making improvements each time we do manual QC.
Proposed Changes to the GA File:
1. column 2: longest version of the gene/protein ID
2. new column (17): ID of the object being annotated. Will be different from column 2 in the case of isoforms
3. column 12: will refer to column 17
4. new header information to be added to the file regarding gene products annotated and expected no. genes in organism
=> Chris: more information about this is now on the wiki.
Rex: key point is that this allows us to annotate isoforms and connect those isoforms to a gene.
Judy: this change will feed into generating the gene index file
Proposed changes to the gp2protein file:
1. should only contain one version of each gene sequence
=> more information about this on the ref genome wiki.
Software development discussed during Ref Genome meeting:
1. reference genome tracking tool to replace google spreadsheets
2. graphical display to look at annotation data
3. integration of ref genome genes into Amigo for next release.
3. Ontology Development
3.1. Report on accomplishments (Midori)
1. Sensu – not used any more in GO terms
2. Disjoint violations corrected
3. Cardio & muscle content meeting work – this work has gone live
4. Electron transport terms reworked
5. SourceForge statistics 500 opened; 478 closed!
Have about 200 items constantly open. At the end of this meeting will close all open source forge items that are more than a year old.
Sue: request that emails be sent out warning you that the items are going to be closed – prompt submitters to reread their items.
6. Regulates relationships – gone live
3.2. Implementing the new regulates relationships (Tanya)
If regulation of X was part_of X then all that need to be changed was to convert to regulates. BP terms reviewed, some added as part_of relations (many new definitions and synonyms also added). Reasoner – used to identify inconsistencies between regulate terms and BP terms, missing terms. QC reports generated.
QC reports: - missing links between terms
- internal consistency within the DAG
- relationship problems (regulate terms that are is_a instead of part_of)
- multiple part_of parentage (if you regulate a process, do you regulate its parent? Doesn’t work in terms with multiple part_of relationships.)
March 25: regulates relationships go live!
1. BP ontology improved – new terms, definitions
2. New relationships portray the biology correctly
3. New logical definitions allow automated creation of error reports as ontology is changed.
1. intersection tags: cross products between the GO ontologies
2. intra- and inter-ontology links for regulates relationships: occurrents can regulate occurrents
MF can regulate MF, BP can regulate BP BUT ALSO MF can regulate BP (an vice versa) Will change MF from being a strict is_a ontology
These are ready to go now but will put out notifications first.
MF that always occurs in a single type of BP are part_of the BP. Reasoning: that BP are a collection of MFs, therefore the relationship should be part_of
=> would appreciate feedback about what people think about this
Comment: when to annotate to regulate term?
- need to decide based on author’s argument
- annotation to regulation of BP will also still be annotation to BP parent term
3. BP ontology improvements:
Metabolism – intermediate regulation parents filled in manually in one section; moving towards computational analysis to find these terms for larger areas of the ontology followed by biocurator review before adding new terms.
Signal transduction – can say that signal transduction is_a regulation of cell communication. Will be starting a push to revamp signal transduction: need definitions with a start point and an end point. Beginning of signal transduction is ligand-receptor binding then the rest is a cellular process.
Finding general BP that signal transduction can fit into as a regulation. EG: biological objective of BMP signaling pathways is to regulate transcription. Where it is now, isn’t connected to transcription
4. QC: continue reviewing QC reports that will be run regularly
3.3. Function -Process Links (Harold)
1. Considerations on Glycolysis and TCA Cycle
First thing is to define start and end points for glycolysis. Made stop point at pyruvate so that didn’t have to consider aerobic/anaerobic processes - same process of defining start and end points for the Kreb cycle.
Examine terms & definitions within the defined start & end of the process - find discrepancies, possibly missing terms. Looking at definitions and refining/adding. Looking at the comparable Reactome data.
Peter: reaction can be slightly different depending on the outcome. Should we look at the pathway as the common element for all processes or should we look at these as separate processes that have separate purposes (with overlap). The latter will become enormously complex as we start considering diverse species.
David & Midori: Historically, BPs are defined by their objective, therefore it makes sense to take the latter view. Will require parent terms to collect these processes. IF different gene products are involved, it should be defined as a different process.
3.4. Electron Transport Cross-products (Jen)
Looking at electron transport region and representing BP & MF cross-products.
Developed lists of MFs, their BP and a taxonomy group that it applies to
Used this to make has_part relationships: BP can’t exist without MF in this taxon
Put has_part relationship to put BP term under MF term (at the moment there are difficulties visualizing this).
Taxon issue – how do we represent that these relationships should be qualified by taxon - possible to have general parent term that has several children term to represent the different taxons.
In order to create links from BP and MF are going to need the new relationship, has_part. At the moment, need to come up with a way of adding this relationship in to the graphs (violates expectations – makes syntactical sense but is harder to understand for biologists).
Judy: Should more of these projects be undertaken at this point?
Jen, David, Harold: These projects should continue, improve ontology.
Need feedback from users – Chris can make a separate version of AmiGO that includes these relationships.
Michael: need careful documentation for users.
David: needs to be presented at conferences to get it out there for comment, get users used to it.
ACTION ITEM: Jen and Harold to continue with this process, making improvements to ontology as they do.
ACTION ITEM: Everyone else to look at these relationships & giving Chris feedback. Files available at OBO Edit scratch page.
3.5. Cross Products between GO BP & CC (Chris) OBO 2 has ways of looking at cross products – intersection editor; files for CC-BP are available in the scratch file.
To capture what is happening with the BP, need a number of new relationships.
Found inconsistencies between ontologies - Idea is to use the intersection editor to keep track of consistency issues between BP & CC
Cross Product timeline:
1. Move cross-product files into OBO edit.
2. Then move cross-products into main ontology file – dependent upon new release of OBO edit.
3. Integration of Cell Line (CL) cross-products into edit cycle – no plan at this time to integrate into the main ontology file.
Working on how to display cross-products in AmiGO. Example: AmiGO with GO & Mouse anatomy; heart development.
ACTION ITEM: Need feedback from users on AmiGO display.
4. Outstanding Issues, session 1 (Midori)
Proteases – is there any objections to a reorganization of MF relating to protease activity? Proposal on wiki for review.
NOTE: If we do it, many protease terms will become obsolete on the basis that they are gene products rather than functions.
Peter: Proteases can be distinguished generically by structure (1) active sites (4 different types) and (2) by where in the target peptide they cut (3 types). This would give us 12 MF terms. Would this cover all proteases?
Need cases from the group to see if lumping the annotations into one of these 12 terms would lose information.
Judy: when do MF terms stop representing classes and start representing individual genes? Specific gene products should not be represented.
Midori: there are some MF terms that have clear differentiate from other MF terms. These terms should be used.
ACTION ITEM: continue to investigate the protease terms to remove concerns about terms that reflect activity of single gene products.
5. Advocacy/Outreach/Collaborations (Jen/Jane)
5.1. Help Desk (stats report) and newsletter
Most queries to GO help desk answered within 24h.
Newsletter released quarterly.
Web presence working group (formerly AmiGO WG) – advocacy group will determine features then the AmiGO WG to implement.
AmiGO future features:
Ref genome display
Wiki style user annotation – GONUTS
AmiGO web services
How to prioritize development and implementation of AmiGO features under discussion.
5.2. Annotation Outreach
Not cold calling dbs any more – all have been contacted.
TAIR have made agreement with journal to accept annotation by submitting authors & discussion at PAG 2008
Reactome have developed a ga file?
Muscle annotation wiki developed
Sol Genomics ga file submitted
6. AmiGO, GO Database
6.1. Software group progress report (Seth)
GO Term enrichment
– new advanced search page & improved search functions
– search function is extend if no result found; may need to add a limit to this search function
- results sorted by relevance
- pie charts have been replaced with bar graphs
- regulates relationship icons have been added
- now have GOOSE links back into AmiGO
Next release of AmiGO (1.6):
- ref genome support is main focus
- display Mary’s graphs with pan & zoom function, more interactive (highlight direct vs indirect, species, ISS only, etc
- display of homolosets on AmiGO
- display of intersections between GO terms (bi-axial viewer)
- links to and from GONUTS to make it easier for people to leave comments & for biocurators to se these comments
6.2. Reference Genes - DB management extensions
Populating the database with NCBI taxonomy trees – use to filter queries
Added support for curation of ref genome curation sets
ACTION ITEM: Ben & Mike – get isoforms into GO database
Annotation Cross Products:
wiki page now for cross product annotation will have case studies added for next GO meeting. Using column 16 to capture cross-product information?
Question: will we need another qualifier to capture this refining information or do we use a separate column (column 16)?
Column 16 is used for refining the information that is captured eg. particular CL for a BP The relationship is between whatever the GO terms is and the refining object.
7. SO (Karen E)
Since Princeton have been working on SO tracker items – 16 items closed.
Working to sort out sequence attributes of SO: looking at annotator consistency of SO attributes. Go through list of sequence attributes and divide into BFO classes: quality, disposition, function, role (from Barry’s definitions). This is done individually, with comments on why the decision was made. Then results pooled and analyzed statistically to determine biocurator agreement. Iterative process.
Doing this to make SO attribute classes that reflect BFO.
8. Progress with OBO-Edit (Nomi)
OBO-Edit 2 improvements:
Documentation and installer
I/O & command-line options
GUI look and feel
Priorities for release of OE2:
restore all functionality that was present ni OE1
bug fixes in new components eg. graph editor
new features requests
Michael Schroeder's plug-in – link GOPubmed to OBO Editor?
9. Overall Project Management
9.1 Structure of Management Groups
What is working (and What is not)?
Wiki is too Kafka-esque: many pages are not linked properly and are hard to find. Could be using wiki categories more.
Too much work time taken up in meetings, not enough time to biocurate. Has the project reached a level of complexity where it is not possible to know everything going on? Should work be summarized so that people can be updated more efficiently? The opportune time to be updated is to review the reports posted by other groups; progress reports are now by WG rather then by MOD.
GO Top needs to review draft GOC and Ref Genome agendas as a group.
Judging by progress, tings are working pretty well but need to balance overhead by biocurators.
GO Manager calls: not enough time to resolve issues brought up. On some calls, one issue has been discussed for most of the call and haven’t moved on to other calls. Is it necessary to get the update from each group every 2 weeks? Calls should de-emphasize progress reports, but focus on the things stopping progress.
9.2 Identify communication bottlenecks and time-sinks, knowing who is responsible and time estimates
Does everyone know who to contact to make progress on a road block? Yes, know who to go to but not sure if there will be a response.
Could we collect Skype addresses & have them available (but hidden form the public)? Would be good to solve problems in real time.
Wiki, Webex & Skype are helping solve problems in real time. This is really helping
One roadblock is to post a source forge item and then not hear back from anyone. Biggest problem with closing items is unable to make progress because of lack of information. Suggestion is to remove the SF item if there is no response after a designated time.
10. Outstanding issues (continued)
10.1 Discuss Establishment and Maintenance Terms
'ACTION ITEM': change to maintenance of localization
10.2 Discuss IMP and the with column.
Guidelines for what goes there - phenotype ID, genotype ID
ACTION ITEM: Chris will be talking to individual groups with how they use the with column for IMP. Each MOD groups needs to respond to this for Chris.
10.3 Discuss how we are going to handle 'response to drug'
SF 1242405 and 1494526 and 'response to toxin' SF1658374. Are they normal biological processes? Response to X (drug, toxin) Response to Chemical (see also Use of Response To Terms in Annotation, for a related issue)(David and Tanya)
ACTION ITEM: Implement Michelle’s proposal to ???
10.4 Chemical derivatives and metabolism terms:
How should we handle derivative compounds in the metabolism parts of GO--should we continue to include metabolism of X derivatives as is_a descendants of X metabolism? It can be confusing. For example, SFs 1885151 and 1847808 note that 'gamma-aminobutyric acid metabolic process' is a descendant of 'fatty acid metabolic process' because GABA is a derivative of butyric acid. (Midori and Val)
ACTION ITEM: Midori and David will deal with looking at these terms.
11. Collaboration with other projects
Reports on new/ongoing collaborations with these groups:
PRO: Protein Ontology (Harold) - PO being made to relate proteins to splice forms, variants, etc. Should be able to attach GO information to these forms.
Reactome - GAF done and ready to be submitted
GOA will be picking this file up. Reminder to check with Mike and make sure that this data is being picked up.
Panther: Will be running Ref Genome targets through Panther.
12. Discussion Points
1. Discuss: At one of our meetings Chris suggested we might keep an archive of gene sets used in publications that were analyzed using GO. Should we do this? (David)
This seems to be a research project.
Should the AmiGO WG provide a way for people to provide datasets?