GOC Meeting Minutes September 2009
- 1 Weds. AM
- 1.1 review grant status (Judy)
- 1.2 Community Interactions and Outreach
- 1.3 Reference Genomes Progress Report (Pascale)
- 1.4 PAINT demo(Suzi)
review grant status (Judy)
- 2yr project to fund CL development focusing on immunology and neurons funded!
- Protein ontology development supplemental grant funded to represent complexes.
Discussion of PRO/GO and their relation to each other
- (Judy)Pro very specific representation..mouse proteins in mouse complex...GO component is representing the class (ie not species specific).
- each pro complex should have a xref to the GO complex.
- complexes should be represented by their function, not their protein members
- (David)some complexes defined by the proteins they containe, some by what the complex does.
- (Judy)function of a complex may be cell type specific.
- (Chris)GO should do complexes, PRO should do proteins which are then pointed to by GO components.
- (Jud)this is not how the process was described in the PRO grant.
- (Ben)complexes defined by their function may introduce a naming issue if the same complex has different functions in different places or cell types.
- (Suzi)GO and PRO need to discuss how best to represent these complexes and relate PRO to GO.
ACTION: Primary stake holders in GO and PRO need to agree on how GO components and PRO will relate to each other
- Funding obtained for protein set specification (panther group)
- (Judy)Gene targets may want to be skewed towards genes known involved in Human disease. Important for grant and ongoing funding
- (Rex)It is mportant to balance with genes having unknown function..emerging new information is important too..balance in choosing targets is needed.
- (Ruth)There are human genes which we know a lot about that are not annotated. People want the data and are surprised not to find it.
- (Judy)Increasing concern that we are sorely lacking in annotation depth in areas we know a lot about. Need more resources.
- (Rex)How do we define relevance to Human biology.?
- (Suzi)The original Reference Genomes proposal goes for annotation breadth AND depth. Need ways to prioritize targets, but this does not preclude inclusion of genes outside of a selected target gene set.
- (Judy)We can look at systems, diseases, processes
Species to consider outreach to
- daphnia (Indiana)has GO annotations (Michael Lynch)
- Sea Urchin
Facilitation of annotation for new genomes
- (Brenley)GONUTS can support annotation of any alternate species.
- (Rama)We need an annotation camp. It’s been several years.
- We need more ongoing training
- (Judy/Paul)PAINT needs a mechanism of annotation for new genomes, have GO jamboree, bring these new groups in.
- (Emily)Electronic annotation pipelines lack development and improvement efforts which could help new groups.
- (Emily)More effort needed in expanding electronic curation efforts. More sources of electronic annotation are out there.
- (David)Automated annotations are too general, result in clustering at high level (like developmental process) which is not that valuable for new groups.
- (Judy)Need continuing discussion here to focus on utility of GO. What do users want/need? How do we focus our effort here for the next year? Need to address to get continued funding.
- (Rex)This was 'THE REASON' for the Reference Genomes Project. How can it’s role be expanded? Why isn’t the project working for non-reference genenomes species?
- (Micheal)EMO-emerging model organisms
- (Paul)82 species have been chosen to be close to as many species as possible for ortholog prediction and functional annotation to related species.
- SP/UNIPROT group are increasing manual annotation efforts primarily focused on Human proteins
- (Kara)Consider outreach to ourselves as a priority for the next year. It’s not clear there is “MOD buy in”. How is our own work being incorporated into our OWN databases? Peter Good wants to know “how does this work advance biomedical hypothesis generation?”
- (David)How do we get non-NIH funding from sources interested in general biology?
- Perhaps there is a mis-perception of the state of things. PAINT annotations are still under development so not yet fully spreading the resulting ISS annotations to other species...that is coming.
- (Susan)Should we highlight Reference Genomes annotations? Groups with one annotator get behind in curation, this should not be confused with lack of buy in or desire to participate
ACTION: MODs-make an effort to highlight Ref Genome annotation
- (David)Data in Reference Genomes paper highlights progress and effect of ref genome effort on annotations.
- (Mike)do MODs announce or advertise their participation in Reference Genomes project?
- (Ruth)Following up with ISS annotations is difficult.
- (Judy)We are poised to make the next step of propagating annotations. This should highlight the annotations generated from the Reference Genomes project.
- (Rex)For grant progress report and Reference Genomes website, we should provide metrics like the number of Reference Genome annotations and the percent of databases that have those annotations propagated.
- Real value will become apparent from propagation of annotations to other species
- (Judy)If we can provide a set of annotations for one of the non-Reference Genomes species in the next year this will be good demonstration of the power of the effort. This may be a good focus point for the next year.
- (David)It is easy but not obvious how to get a version of the GO. Needs to be easier if we are asking users to do this.
- (Midori)Users are not asking for how to find ontology version numbers
- (David) I see requests for how to find out what version was used in past studies
Community Interactions and Outreach
- (Val)pombe community annotation: http://www.sanger.ac.uk/Projects/S_pombe/community_curation.shtml
- How did Val get 19/20 respondants?
ACTION:Val will make available the form used to elicit responses from authors
- (val)Estimates 1/3 the time to complete the curation of papers with authors filling in this form. Form sent to last author. Results are processed by Val. Results generate questions etc..so much work follows on responses, but does get info direct from authors.
- not traced to who did the annotation though.
- facilitates user understanding of the existing annotations.
- TAIRs Plant Phys submission form can be seen here: http://www.aspb.org/publications/tairsubmission.cfm
- (Tanya)TAIR Plant Phys submission form was improved with autocomplete to help submitters pick GO terms. Also started similar process with the journal Plant Journal using a template spreadsheet. PJ handles this as a supplemental data submission included with the paper when published.
- (Jane)Recommendations from the GO review
- (Judy)What is a function, what is a process? We should define these..what do we mean by these in GO? We need to be clear about what we are doing.
- Seth can set up access to Google Analytics if you send him your gmail account.
- (Mike)Machines were moved at Stanford, disrupting Google Analytics reporting, but usage stats have been very constant over past 12-24 months..what does that mean?
- (Jane)GO News site(http://go.berkeleybop.org/news4go/), twitter, and RSS feed are available off gene ontology web page.
- (Chris)AmiGO status-IEAs included, but not GOA files...species excluded consequently, but IEAs for MODs included.
- (Judy)What are the questions that wet bench biologists want to ask of GO...and are we making that possible for them?
- (Ruth)Labs use Inginuity because Inginuity have a large number of annotators and users get more complete results. They use their own interface, GO+their own ontologies, own annotations. How many groups pay for access to inginuity? Is that money well spent?
- (David)Examine inginuity..why are users using it? What can we learn from that?
- (Jen/Varsha)Incorrect annotation is offputting, so is incomplete annotation
- (David)Users may perceive an annotation as incorrect, but often that is not the case.
- What is the issue at hand?
- interface (ala inginuity..inginuity provides a distinct service than GO)
- do we need to distinguish GO from inginuity, when to use which, for users?
- lack of completeness of annotation or ontology?
- incorrect annotation or ontology structure?
- user perception or understanding of the GO? (dopamine is not always a neurotransmitter for example)
- why are biologists not using GO? If not using GO what alternative is being offered?
- Users/developers of third party GO tools may not keep their ontologies up to date which makes GO seem less useful.
- (Mike)We need to be creative about what GO provides.
- (Judy)We might want to lower the visibility of GO Tools and increase the visibility of using GO from the GO site and point from MODS to GO site.
- (Paul)If we are too far removed from end users it is hard to value our contribution and hard to get feedback from users directly
- (Tanya)Target surveys to MOD users list, but also meeting registrants who may or may not be MOD users. Ask if they use GO? Provide a web page for a survey so all survey respondants can fill out the same form no matter which meeting they are at?
ACTION: Build a survey for grant update (Val, Jane, Jen), determine where to send it. (Mike has subscription to SurveyMonkey)
- (Midori)EBI targets biologists...ask them how they do it!
- (Eva)Send reminder emails to survey recipients to try to reduce self selection of survey responders.
ACTION: provide two survey urls and compare responses from a targeted list like the submitters to GO help list with responses from a random list of biologists
Reference Genomes Progress Report (Pascale)
- (David)Do we have a gold standard set of genes we use to test annotation metrics? Not currently..but we should find a set for this purpose. This will standardize testing of annotation progress.
- (Brenley)Should GONUTS use UniProt IDs? -Yes
- (Brenley)Should GONUTS use UniProt protein names? maybe..but may not be that useful
- (Judy)There is a proposal for Human/Mouse/Rat isofom nomenclature
- (Chris)in GAF2.0 annotations to genes, col. 17 will contain specific isoform reference, typically UniProt.
- need java 1.5 to support drag-and-drop function critical to PAINT annotation process.
- PAINT uses the GOlite DB, updated ~weekly
- Update GAF files no less frequently than once per month!
- Transferring annotations only can happen when manual annotation is complete by all groups for a given gene (QUESTION:How does this get reconciled with the proposal put forth by Pascale later in the day to do away with Google spreadsheet?)
ACTION: Locate code for making font size changes in PAINT(Paul)
- Name column in PAINT table view should say “symbol” perhaps?
- Uniprot IDs should be indicated as ‘Reviewed” or “unreviewed” (UniProt vs. Trembl)?
- (Judy)Make lines in table for genes that have manual annotation more prominent somehow..bold is not visible enough, Michael assures it is plain as day on the computer screen...
- (David)Would like to see filter/highlight by species to examine annotation of paralogs
- (David)We should be sure to have a common mechanism for feeding GAF files back to MODs, could be used for PAINT annotations, GONUTS GAF dumps, and inferred annotations from Function/process cross references.
- Positive annotations should not be propagated to clades that have NOT qualified annotations to related terms..Suzi assured us that won't happen.
- As PAINT tree annotation proceeds, propagation of annotations that can be automated will be. Helps to streamline PAINT curation process.
- (Pascale)GAF files generated from PAINT are already available from wiki and Ref Genome page for MODs to load into their own data.
- (Paul)Propagation of annotations depends upon our ability to interpret annotations and the context in which that annotation was made. *Some of those details lay in col. 16 like process X occurs in cell CL:#### at stage Y.
- (Suzi)If PAINT illustrates problems with protein/gene records (example: gene merge needed), how can that be fed back to sources to get the problem adjusted?
- (Paul)Consider infrastructure to make more conjunctive statements-function X occurs in component A of cell Y for example.
- Consider choosing target genes based on functional systems
- (Rex/Judy)All time taken away from literature curation takes away some experimental annotations needed to support inferential annotations.
- (Paul)The PAINT tree curation process is not highly scalable. Re-reading of papers is sometimes needed to facilitate proper tree curation...can that be mitigated in some way?
- (Judy)Efficiency of tree curation is important, propagated annotations will be examined by MOD curators before they are added to our databases. This will then address some of the complexity that Paul was bringing up.
- (Rex)bottom line...we (all of us) don’t have enough funding to curate all the necessary literature.
- (Judy)We already set the bar lower from “complete” annotation to “comprehensive” to help address the volume issues.
- (Judy)We have a lot of genes named that users reasonably expect to be annotated. Proper experiments to demonstrate the functions won’t ever be done in mouse for example if they have been done in cerivesia. These annotations are needing to be transferred..users expect them to be there..basic functionality like spliceosome.
Proposed method to get away from Google Spreadsheets
- Pascal’s proposed method to stop using the Google spreadsheets for Reference Genomes annotation tracking was generally accepted as worth further exploration.
- Compare relative dates of PAINT tree annotation with the most recent date for experimental annotation from all GAFs.
- (Rex)How will status of ‘comprehensive’ annotation be captured with Pascale’s proposal? Earlier it was stated that 'comprehensive' annotation status is needed before PAINT curation can occur.
- (Pascale)May not need concept of ‘comprehensive’..why not just go with ‘here is what could be propagated as of this date’, and then keep track of which genes need new PAINT tree curation due to newly added experimental annotation?
- (Pascale)In PAINT, the presence of experimental annotations should be emphasized from ANY source..not just Ref. genomes.