Oregon GO Consortium Meeting
From GO Public
Return to Consortium_Meetings
March 30, 31
University of Oregon, Eugene, Oregon
Erb Memorial Union (EMU), Fir Room.
Map of campus, http://map.uoregon.edu/
Please add yourself to the list of registrants on the Meeting Logistics Page
ALERT: GO Top needs to sign off on agenda prior to these meetings. If there is no action item, or discussion point proposed in advance, the meeting is not the forum for first initiating a discussion.
MONDAY March 30
(4:30- 5:00pm BST)
(5:00- 6:30pm BST)
OBO-Edit2.0 Tutorial and Demo (Chris, Amina, OBO-Edit Working Group)
- Working with Cross-products in OBO-Edit (Chris)
- Finding xp terms, searching, recognizing, constructing and manipulating xps
- Using the xp panel in the Text Editor
- Demo using the "internal" xps to be added to the live GO
- OBO-Edit Working Group Meeting
(6:30- 6:45pm BST)
(6:45- 8:00pm BST)
Working sessions: 1) David Hill, Tanya and Amina 2) OBO-Edit Working Group Members
Lunch - Take a walk...Get food in the EMU or local restaurants
Reference Genomes Update
Brief report of Ref Genome Progress (Pascale)
Watching remote people: Tanya Berardini
GO Clinics Update (Jane -remote-)
Watching remote people: Tanya Berardini
- Clinics will be booked when there are interested parties to deal with SF items
- Only SF with parties interested in resolving the SF items go to clinics.
- Determine how many SF items have no comments.
- Old items that are no longer of interest can be closed.
Metrics Metrics (Judy)
Watching remote people: Tanya Berardini
- Action Item: What is appropriate metric for comprehensive annotation?
We need to be able to define project points for funding purposes. We need to document progress and justify priorities.
we need to keep appropriate metrics of our progress and to define how we set priorities. Some possible measures across the board that we could include in a monthly table might include
- Number of Protein Coding Genes
- Number of Functional Genes
- Total number of Genes
- Number of Genes with Comprehensive Annotations
- Number of Genes with only IEA in all domains
- Number of Genes with Experimental Data for BP
- Number of Genes with Experimental Data in MF
- Number of Genes with Experimental Data in CC
- Date of last update of annotations
- Annotation Priorities
- Annotation of Reference Genome Genes
- Annotation of Genes with uncurated experimental literature
- Ontology Development
- Keeping track of dependencies (need new terms to do annotations, etc)
- Knowing where we need to devote ontology development efforts
- Measuring progress...when have we saturated a branch, at least for the time being
All of these are for reporting purposes.
David: from a practical point of view, we're done when no one wants new terms... 'how many new terms relative to the number of annotations... This is one way to measure whether we are done
Judy: Funding sources need us to define what our tasks are, how we will work on them, and how we will know when we have completed the scoped out work.
David: The number of annotations vs. the number of new terms requested. When new terms go down and new annotations keep being added..ontology development is nearing completion...
- Progress reports link in left navigation column on wiki
- Sometimes work waits while other work is in progress...need to track work dependencies better.
- Pass SF item to responsible person and mark as 'pending' to track dependencies?
- [all groups] update progress reports at meeting times and grant renewals
- [Judy?] template for desired reports can be found on wiki
- [Chris] made plea for better use of wiki categories..perhaps a webex session could be used to briefly teach us about that?
Watching remote people: Stan Laulederkind
- Binding term documentation (from Debby)
- using 'binding' for annotation
- should it be used in cross product annotations (Jim) (see Annotation_Cross_Products#binding_example)
- transfer of 'binding' term annotations via ISS/ISO??
- [Debbie] documentation is confusing on the proper use of binding
- [Val] need to be able to annotate to 'ATP binding'
- [Peter?] binding of X resulting in an allosteric change to the thing doing the binding is different than binding resulting in the calytic change of the bound molecule to a different product....chemically transformative binding vs. non-transformative binding. It is the later that we should be capturing.
- [David] Should GO be capturing binding of enzymatic substrates? Thinks not.
- How will limiting 'binding' annotations to non-catalytic interactions affect queries for genes involved in 'ATP binding' for example...researchers might reasonably expect to get back kinases by such a query?
- [Mike] Enzymes work in both directions..if a kinase binds ATP, does it also necessarily bind ADP?
- [Peter (lead), Ruth, Debbie, Jim] Form a workgroup to examine the issues raised in the discussion. Should GO capture catalytic binding?
Column 16 (Chris)
Watching remote people: Amina Abdulla
Can we have a progress report and a target date to work towards? Column 16 will contain cross references to other ontologies that can be used to qualify the particular annotation. Harold's draft user doc for column 16/17 usage: File:Columns 16 and 17.doc
Official documentation: Annotation_Cross_Products
- [Chris]if the underlying biology (mechanism or genes involved) is different then new precomposed terms are preferred. Discussion ensued on how far to take that. What if a process is executed in a different location in one species vs. another but mechanistically is the same? What if a the genes involved are different in one organelle vs. another in the same cell?
- [Judy] We should have the terms needed to describe the biology.
- [David] if there is a need to put 2 IDs in column 16 from the same ontology, then that ontology has a problem that needs to be addressed
- [Chris] Use of 'NOT' is currently banned from column 16 to avoid possible further confusion. Can be revisited once in use more.
- [All] By the end of April send Chris GAF files with column 16 data populated. Use GAF format 2.0 in the header of GAF files with column 16 or 17 data in them. Once Chris has examples a workgroup will assemble to examine the usage and flesh out inconsistencies, misuses, etc.
Column 17 (Chris)
Watching remote people: Stan Laulederkind
Can we arrange a date to release this? Column 17 is designed to allow GO annotations to specific isoform variants that may be encoded by a specific gene due to differential splicing or alternative translational starts. Harold's draft user doc for column 16/17 usage: File:Columns 16 and 17.doc Official documentation: GAF_Col17_GeneProducts
- Using 'gene_product' rather than 'gene' in column 12 is a radical change (GAF 2.0 format). Mike's GAF checking script could process use of 'gene_product' to 'gene' as we make a transition?
- [developers] Define a constrained set of strings derived from SO that can be used in GAF column 12
- [Mike] Update GAF filtering script to change 'gene_product' to 'gene' in column 12 until final change date is set.
- [All] By the end of April, send test GAF data to Chris with column 17 populated. Use GAF format 2.0 in the GAF header for such a file.
Watching remote people: Donghui Li
Progress report on Sequence Ontology (Karen E) [slides]
- Nice progress since Montreal meeting
- [Suzi, Paul] Remove use of ND from annotation of ancestral nodes
Dinner at Mekalas
TUESDAY March 31
Review Monday's Notes and Action Items
Ontology Content Development
1. GO_Timeline (10 mins, Midori,Ontology development team) 1.1
- started publishing a file call go_ext (GO-extended) (contains the MF-MF intraontology links and the BP-MF interontology links)
- put in place process for conversion of files
- new pipeline:
go_write --cp--> go_ext --filter P/F, filter F/F --> obof1.2 --> obo2obo --> obof1.0
In other words. the new file (go_ext) will be filtered to remove Function/process links from the file, and a file of the obo format 1.0 will be generated
- Jane: thought that we'd switch to the new file. Won't people ignore it?
- Chris: we can't force people to all change their algorithms
- David: hopefully people will come to us and use the data we provide, and the tools we provide in order for them to use those correctly.
1.2. released of Inter-Ontology 'regulates' links
1.3 SF status update (5 mins, Midori) see progress report
Work in progress
Watching remote people: Susan Tweedie
- Ongoing quality control using reasoner-generated reports. (10 mins, Chris, David, Tanya) slides
- We're continuing to use the ontology QC procedures we reported in October. These include:
- Fixed tissue, organ, to match the definitions of the anatomical dictionaries
- Consistency checks for new regulation terms and regulates relationships;
- OBO-Edit 2.0 allows to fix missing links automatically (or fix errors in the ontology)
- David and Tanya looked at what types of errors occured: they looked at 52 errors, and found they fit in 6 categories:
like forgot to add parents or relationships; in 5 cases the ontology was wrong and got corrected.
- Note that this error rate is pretty low.
- A report on terms with multiple part_of parents;
- Cross-product reports that can be used for QC; also see Progress_2008.
Progress is summarized on the Ontology_QC_Metrics page.
- Lung development (15 mins, David) slides
- A content meeting on lung development was held in December 2007.
- Met with experts in Boston; who did not want anatomical terms in the lung development in the ontology. So David made two files, one with and one without anatomical structures.
- Until last week, lung development only had one child, and Dimitry (MGI) has been requesting new terms; interestingly those include anatomical terms
- This also leads to the expansion of the 'morphogenesis of a branching structure' branch of the GO
- Chris asks whether the lung development is being analyzed from the mouse/mammalian point of view: David says, for lung development, yes, but for branching morphogenesis he is trying to capture different organs and species
- Alex: makes the point that this is analogous to the mitochondrial/cytoplasmic translation issue: all those branching types of development are different to some level, so we might need to add much more terms to describe this?
- David: we've asked this question since the beginning of GO
- Judy: we need to provide the information that researchers need
- Biological Signaling (5 mins, Jen -remote-) slides
Watching remote people: Michelle Gwinn-Giglio
- Organization and Biogenesis (15 mins, David, Jane -remote-, Midori -remote-, Tanya) slides
- postponed till after break
- Internal cross-product implementation (10 mins, Chris, David, Jane-remote-, Jen-remote-, Midori-remote-, Tanya)
- David & Tanya are doing regulation xps
- Jen is doing BP x BP xps
- Jane is doing regulation of multi-organism process
- Jane is doing CC x CC xps
- Proposal to add process-specific function terms so that part_of links can be made between function and process (30 mins, Chris, David, Tanya) slides
presenter in bold
- Rational: each MF is a part of at least some BP (a MF has_part in a BP)
- Problem is that for a given MF, you cannot be sure which of several BP it is involved in
- An option is to use column 16
- Proposal from Chris, Tanya and David: use part of: start with the easy things:
- kinase activity -> phosphorylation; transporters -> transport
- a more complicated example is 'arginosuccinate synthase activity' involved in urea cycle and polyamine biosynthesis
- Pascale: what if the paper doesn't show the process as well? Tanya: you use the parent
- Michelle: thought we had discuss this and decided against it, first because of term explosion, but also the huge amount of variation in metabolic pathways
- David: no, this was using has_part, which means that is always has part. Part_of does not require that
- (back to Tanya)
- Example 1: Stan (RGD) requested 'regulation of Neyu/ErbB-2 receptor activity' on SF: term = 'coregulator activity involved in epidermal growth factor receptor signaling pathway' ; (and its regulation child); the term requested by Stan would be a synonym
- Example 2: caspase activity (which was obsoleted) could be called 'cysteine-type endopeptidase activity involved in apoptosis'
- Example 3: transcription factor activity: could be 'DNA binding involved in transcriptional regulation'
- Example 4: chaperone activity: could be 'protein binding involved in protein folding'
- Advantages of this approach:
- 'protein binding involved in protein folding' is a protein_binding (F) and part_of 'protein binding' (P): makes it easier to count annotations and get inferences, and makes it easier for annotators to capture all information
- Having those more granular functions (involving processes) allows one to test what potential processes a function might be involved in
- Rama: what would the user see?
- Does this mean you dont annotate to the process (or component) annotations specifically?
- David: no, you dont.
- Jen: can you still mine those automatically with the other pathway resources?
- David, Chris: there is a problem with visualization, but that has been the case for a while
- Michelle: are those terms F or P? Answer: F, because they has is_a parent to F, not process
- Michelle: why would do column 16 annotations (to do two two GO terms)? David: to save time; if those are used a lot we can create a new term
- Emily: similar point about column 16; also wondering about the evidence that could actually provide enough confidence for the annotation to one of those terms
- Ruth: this will not get rid of the need to IC
- Alex: if it's not reported in the paper, you dont annotate it
- Ranjana: what happens to data miners?
- Chris: if people do do use the graph, then it's too bad (Judy, David, others agree)
- Paul: you have to wonder if this is a hack? why dont we use external resources that already have this information (ie, the different pathway tools? )
- Chris: not quite sure how Paul's proposal is practically implemented; also this approach makes it more simple to integrate pathways
- David: if GO wants to represents is pathways, then he agrees it's a hack; but we're not. We're trying to annotate what gene products do.
- Peter: we need to treat pathways databases and other external resources as we treat publications (ie, a reference for the link)
- Tanya: To answer Rama and Ranjana: people need to use information from the graph anyway
- Michelle: The 3 ontologies have been separated for the past 10 years; we need to think about what users know and expect. This is now very different conceptually
- Judy: WRT Process parentage in MF graph: it's the part_of that goes across graphs. We have always understood that MF and BP were part of the same. We need to provide ways for users to get to all this information, and this is a step in this direction. This is just how it evolves.
- Chris: agrees with Michelle, we need to educate people
- Kimberly: likes the links; but one concern is that from the point of view of the annotators: what seems like errors of omission can be due to confusion when do you demonstrate function and process. We need to come up with some guidelines as to use those new terms
- Jen: There are users (biologists) feel that it's weird that we dont have the links.
- Rama: the issue is not to make the annotation
- Susan: Agrees that users should use the graphs; all our web pages show the 3 aspects, (Karen C): agrees, also look at that information for checking annotations are complete
- Chris: the Software group can write a tool that will filter out the redundant annotations and add back those Process annotations if required
- Jen: to address the point of users being ready: we should discuss plans to educate, etc
- David: WRT annotation: we shouldn't be slowing progress by thinking about what evidence code to use
(6.30pm-7pm BST) Break
The wrap up on Tanya and David's proposal on making process-specific function terms is that agreed we can go ahead and make the terms/links since there seems to be a clear utility in having these types of terms with respect to things like making cross references with pathways from Reactome and other groups. However, there was NOT agreement about how to annotate to this type of term. The proposal suggested that annotation to one of these terms would be sufficient for Process annotation, even though the term is actually in the Function ontology, i.e. that it would NOT be necessary to also make an annotation to a term in the Process ontology. There was concern from a number of groups that allowing an annotation to a Function term to also functionally replace making any annotation in the Process ontology will really throw users who expect to only need to look within the Process ontology for Process annotations.
- people send sample papers to David with examples for where it might be appropriate to use one of these process-specific function terms, e.g. reference genome jam papers that were discussed earlier (everyone)
- come up with a proposal for annotation practice for process-specific function terms, particularly with respect to whether it is or is not needed to also make an annotation in the process ontology (David)
The work affected terms related to cellular component in process (organization & biogenesis terms). These types of terms were split into two separate terms, cellular component organization or cellular component biogenesis. Most of this work is already done. Remaining to do: cell projection terms, better parents for cellularization, conidium formation, and platelet formation
Judy Blake mentioned that we have had some contact with a group of people interested in modelling cellular processes (going from 1 to 2 cells). They were concerned about some missing parent terms, but we have let them know that we're willing to work with them, so we may have another group to work with on improving this area.
Ontology development discussion topics
- Addition of localization specific process terms? (This was discussed on the GO list after it was raised in the last reference genome jamboree but hasn't been resolved.) (David will introduce topic) Watching remote people: Stan Laulederkind
- Chaperones: revisited many years later (Midori will introduce topic, Rama will moderate discussion) - see introductory slides; Chaperones wiki page; SF 2560932 Watching remote people: Lakshmi Pillai
- 'Response to stimulus' - does the definition need to be revised? Clarify when these terms should and should not be used for annotation; see introductory slides (also relevant to Reference Genomes) and SF 2094943 Watching remote people: Amina Abdulla
- postponed till after lunch
- The possibility of linking terms in the OBO file to discussions in the GO wiki. This could be done to ensure increased awareness of ontology/annotation discussion topics for all curators (and interested external users). This item follows on from the 'document communication' discussion at the last GO Consortium meeting  (Emily will introduce topic) slides
Watching remote people: Lakshmi Pillai
NOTES on Addition of localization specific process terms?
This was discussed on the GO list after it was raised in the last reference genome jamboree but hasn't been resolved. Karen gave the background. At the last Reference Genomes Annotation jamboree, one of the genes was a protease specifically located in mitochondrion. We now know that there are actually two specific localizations of different proteases within the mitochondrion, matrix and inner membrane. At the annotation jamboree, people were not sure about whether there was a clear statement about the recommended practice for when to add, or not add, these types of terms.
David is in favor of adding these term, embrace the explosion, a la John Richter. While there was a group expressing concern about such large scale expansion of terms in the email discussion, at the meeting, there was not really any concern about expansion of terms being a problem. Thus the consensus was that these types of terms are fine.
In terms of clarifying the guidelines for when such terms are appropriate, the idea was that if it is a different set of genes with different regulatory implications, then it is appropropriate to request a localization-specific process term. When the same set of genes does basically the same thing in multiple compartments, then there doesn't seem to be any need for such terms. In addition, when unsure, go ahead and make an item on the Curator Requests SF tracker so that the possible term can be discussed.
NOTES on Chaperones: revisited many years later
The history is that we used to have a number of terms in GO for chaperone, but they were made obsolete for a number of reasons:
- represented a gene product class name
- used three different ways
- binding to unfolded proteins
- binding to and refolding unfolded proteins
- used inconsistently in annotations
The group discussing this has stalled with respect to coming up with a proposal for how to add some terms back to represent "chaperone"activity" without regenerating the problems listed above. One proposal had been to have terms called protein folding chaperone activity and transport chaperone, however, there was concern that these names might not be sufficient for users and annotators to understand correct usage. Based on the process-specific function term discussion, there was a suggestion to create a term protein binding involved in protein folding.
NOTES on The possibility of linking terms in the OBO file to discussions in the GO wiki
Emily presented her slides on the issue giving the case for making links in the comments from GO terms so that curators/annotators would be alerted to the fact that an area of the ontology is currently under revision and also link to a place where one could see the discussion. One idea she suggested is to link terms in the OBO file to discussions in the wiki in order to help annotators see and find when there is an ongoing discussion or helpful comments. She also asked if we should we have a type of comment that is propagated to all descendent terms.
Chris commented that he's a little hesitant about putting actual links in the OBO file as it may be hard to keep urls up-to-date. He suggested that alternatively, perhaps we could come with methods utilizing the GONUTS API to show these comments on AmiGO and link out to discussion pages.
Alex commented that GONUTs is clearly separate, external, and perhaps more accessible to the community, perhaps put community annotation pages into GONUTs.
- Karen will submit a SF request for the appropriate terms for mitochondrion specific proteolysis.
- The chaperone group will proceed based on the suggestion to create a term protein binding involved in protein folding
- Amelia and documentation group will work on making comments and links to ongoing work more accessible to curators, possibly via AmiGO.
Next GOC meeting - Fall 2009
- GOC September 23-24
- SAB September 25
- Both at Jesus College, Cambridge, UK
- Jesus College is already booked for next GOC meeting, more details will be forthcoming at an appropriate time.
Lunch - Catered at meeting room
Watching remote people: Peter D'Eustachio
Web presence, Outreach and User Advocacy
Outreach and Advocacy
- Short report from Emily Dimmer of GOA on the progress in introducing Swiss Institute of Bioinformatics curators to GO annotation. (5 min)
- Report from Jane on Help Desk stats, and other user advocacy progress since the last consortium meeting including demo of new news site by Seth. (20 min) slides
Resource Usage statistics (Mike)
Amigo update and report (25 min). See AmiGO 1.6 wiki.
~20 minute overview of GONUTS (a Gene Ontology Normal Usage Tracking System.) Some things have changed in the last year, and many improvements have been made to facilitate using GO.
General Annotation Issues
Watching remote people: Lakshmi Pillai
- Should we IC from ISS annotations?
- documentation currently says IC from ISS ok
- this is not allowed in the requirements for PAINT (but we can change that if needed)
GO Papers, Publications and Presentations
- review for Molecular Reproduction & Development about using GO to study development (David, Doug, Kimberly, Tanya)
Return to Consortium_Meetings