GOC Meeting Minutes September 2009
review grant status (Judy)
- 2yr project to fund CL development focusing on immunology and neurons funded!
- Protein ontology development supplemental grant funded to represent complexes.
Discussion of PRO/GO and their relation to each other
- (Judy)Pro very specific representation..mouse proteins in mouse complex...GO component is representing the class (ie not species specific).
- each pro complex should have a xref to the GO complex.
- complexes should be represented by their function, not their protein members
- (David)some complexes defined by the proteins they containe, some by what the complex does.
- (Judy)function of a complex may be cell type specific.
- (Chris)GO should do complexes, PRO should do proteins which are then pointed to by GO components.
- (Jud)this is not how the process was described in the PRO grant.
- (Ben)complexes defined by their function may introduce a naming issue if the same complex has different functions in different places or cell types.
- (Suzi)GO and PRO need to discuss how best to represent these complexes and relate PRO to GO.
ACTION-primary players in GO and PRO need to agree on how GO components and PRO will relate to each other
- Funding obtained for protein set specification (panther group)
- (Judy)Gene targets may want to be skewed towards genes known involved in Human disease. Important for grant and ongoing funding
- (Rex)It is mportant to balance with genes having unknown function..emerging new information is important too..balance in choosing targets is needed.
- (Ruth)There are human genes which we know a lot about that are not annotated. People want the data and are surprised not to find it.
- (Judy)Increasing concern that we are sorely lacking in annotation depth in areas we know a lot about. Need more resources.
- (Rex)How do we define relevance to Human biology.?
- (Suzi)The original Reference Genomes proposal goes for annotation breadth AND depth. Need ways to prioritize targets, but this does not preclude inclusion of genes outside of a selected target gene set.
- (Judy)We can look at systems, diseases, processes
Species to consider outreach to
- daphnia (Indiana)has GO annotations (Michael Lynch)
- Sea Urchin
Facilitation of annotation for new genomes
- (Brenley)GONUTS can support annotation of any alternate species.
- (Rama)We need an annotation camp. It’s been several years.
- We need more ongoing training
- (Judy/Paul)PAINT needs a mechanism of annotation for new genomes, have GO jamboree, bring these new groups in.
- (Emily)Electronic annotation pipelines lack development and improvement efforts which could help new groups.
- (Emily)More effort needed in expanding electronic curation efforts. More sources of electronic annotation are out there.
- (David)Automated annotations are too general, result in clustering at high level (like developmental process) which is not that valuable for new groups.
- (Judy)Need continuing discussion here to focus on utility of GO. What do users want/need? How do we focus our effort here for the next year? Need to address to get continued funding.
- (Rex)This was 'THE REASON' for the Reference Genomes Project. How can it’s role be expanded? Why isn’t the project working for non-reference genenomes species?
- (Micheal)EMO-emerging model organisms
- (Pau)82 species have been chosen to be close to as many species as possible for ortholog prediction and functional annotation to related species.
- SP/UNIPROT group are increasing manual annotation efforts primarily focused on Human proteins
- (Kara)Consider outreach to ourselves as a priority for the next year. It’s not clear there is “MOD buy in”. How is our own work being incorporated into our OWN databases? Peter Good wants to know “how does this work advance biomedical hypothesis generation?”
- (David)How do we get non-NIH funding from sources interested in general biology?
- Perhaps there is a mis-perception of the state of things. PAINT annotations are still under development so not yet fully spreading the resulting ISS annotations to other species...that is coming.
- (Susan)Should we highlight Reference Genomes annotations? Groups with one annotator get behind in curation, this should not be confused with lack of buy in or desire to participate
- ACTION ITEM: MODs-make an effort to highlight Ref Genome annotation
- (David)Data in Reference Genomes paper highlights progress and effect of ref genome effort on annotations.
- (Mike)do MODs announce or advertise their participation in Reference Genomes project?
- (Ruth)Following up with ISS annotations is difficult.
- (Judy)We are poised to make the next step of propagating annotations. This should highlight the annotations generated from the Reference Genomes project.
- (Rex)For grant progress report and Reference Genomes website, we should provide metrics like the number of Reference Genome annotations and the percent of databases that have those annotations propagated.
- Real value will become apparent from propagation of annotations to other species
- (Judy)If we can provide a set of annotations for one of the non-Reference Genomes species in the next year this will be good demonstration of the power of the effort. This may be a good focus point for the next year.
-Easy but not obvious how to get a version of the GO. Needs to be easier if we are asking users to do this. -Midori-users are not asking for how to find ontology vers. # -David sees requests for how to find out what version was used in past studies
11AM: community interactions and outreach -Val-pombe community annotation: Pombe in google->community curation->see papers -How get 19/20 respondants? -Val: will send out form used to elicit responses, estimates 1/3 the time to complete the curation. Form sent to last author. Results are processed by Val. Results generate questions etc..so much work follows on respones, but does get info direct from authors. -not traced to who did the annotation though. -facilitates user understanding of the existing annotations. see submission form here: http://www.aspb.org/publications/tairsubmission.cfm
Tanya: Plant Phys submission form improved with autocomplete to help submitters pick GO terms. Started with second journal (plant journal) using template spreadsheet. PJ handles this as a supplemental data submission included with the paper when published.
Jane: recommendations from the GO review
-Judy: what is a function, what is a process...we should define these..what do we mean by these in GO? We need to be clear about what we are doing.
-Web stats: Seth Seth can set up access to Google Analytics if you send him your gmail account.
-Mike: moved machines at Stanford, usage stats very constant over past 12-24 months..what does that mean?
Jane: GO News site available off geneontology web page
Chris-AmiGO status-IEAs included, but not GOA files...species excluded consequently, but IEAs for MODs included.
Judy-what are the questions that wet bench biologists want to ask of GO...and are we making that possible for them?
Ruth-labs use Inginuity because they have large number of annotators get more complete results. They use their own interface, GO+their own ontologies, own annotations. How many groups pay for access to inginuity? Is that money well spent?
David-examine inginuity..why are users using it? What can we learn from that?
Jen/Varsha-incorrect annotation is offputting, so is incomplete annotation
What is the issue? --interface (ala inginuity..inginuity provides a distinct service than GO) --Do we need to distinguish GO from inginuity, when to use which, for users? --lack of completeness of annotation or ontology? --incorrect annotation or ontology structure --user perception or understanding of the GO (dopamine is not always a neurotransmitter for example) --why are biologists not using GO? If not using GO what alternative? --Users using GO tools may not keep their ontologies up to date which makes GO seem less useful.
-Mike-we need to be creative about what GO provides. -Judy-We might want to lower the visibility of GO Tools and increase the visiblity of using GO from the GO site and point MODS to GO site, -Paul-if we are too far removed from end users it is hard to value our contribution and hard to get feedback from users directly
Tanya-target surveys to MOD users list, but also meeting registrants who may or may not be MOD users. Ask if they use GO? Provide a web page for survey so all users can fill out the same form no matter which meeting they are at?
ACTION ITEM: Build survey for grant update (val was nominated with Jane and Jen), determine where to send it. (Mike has subscription to SurveyMonkey)
-Midori-EBI targets biologists...ask them how they do it! -send reminder emails to survey recipients to try to reduce self selection of survey responders.
ACTION ITEM:-provide two survey urls and compare responses from <GO help list with random list of biologists
Pascale: progress report
-David-do we have a gold standard set of genes we use to test annotation metrics? Not currently..but we should find a set for this purpose. Will standardize testing of annotation progress.
Brenley: -GONUTS should use UniProt IDs -Using UniProt protein names may not be that useful -Judy: there is a proposal for Human/Mouse/Rat isofom nomenclature
-Chris: annotate to gene, col. 17 will contain spec. isoform reference
Suzi: PAINT demo -need java 1.5 to support drag&drop. -PAINT uses the GOlite DB, updated ~weekly -Update GAF files no less frequently than once per month -transferring annotations only can happen when manual annotation is complete by all groups for a given gene Paul-ACTION-locate code for making font size changes in PAINT -name column in table view should say “symbol” perhaps? -Uniprot IDs should be indicated as ‘Reviewed” or “unreviewed” (UniProt vs. Trembl) -Judy-Make lines in table for genes that have manual annotation more prominent
-David-would like to see paralogs-ie filter/highlight by species
-David-We should be sure to have a common mechanism for feeding GAF files back to MODs, could be used for PAINT annotations, GONUTS, and inferred annotations from Function/process cross references. -Positive annotations should not be propagated to clades that have NOT qualified annotations to related terms. -As PAINT annotation procedes, propagations that can be automated will be within PAINT to streamline PAINT curation process -GAF files from PAINT are available from wiki and Ref Genome page
Paul: propagation of annotations depends upon our ability to interpret annotations and the context in which that annotation was made. Some of those details lay in col. 16 like process X occurs in cell CL:#### at stage Y.
Suzi-if PAINT illustrates problems with protein/gene records, like a likely gene merge neded, how can that be fed back to sources to get the problem adjusted?
-Consider infrastructure to make more conjunctive statements-function X occurs in component A of cell Y for example.
-Consider choosing target genes based on functional systems
-Rex/Judy: all time taken away from literature curation takes away some experimental annotations needed to support inferential annotations.
Paul: The PAINT inference curation process is not highly scaleable. Re-reading of papers is sometimes needed to facilitate proper tree curation...can that be mitigated in some way?
Judy: Efficiency of tree curation is important, propagated annotations will be examined by MOD curators which will then address some of the complexity that Paul was bringing up.
Rex: bottom line...we (all of us) don’t have enough funding to curate all the necessary literature.
Judy: We already set the bar lower from “complete” annotation to “comprehensive” to help address the volume issues.
Judy: we have a lot of genes named that users expect to be annotated. Proper experiments to demonstrate the functions won’t ever be done in mouse for example if they have been done in cerevesie. These annotations are needing to be transfered..users expect them to be there...basic functionality like spliceosome.
Pascal’s proposed way to stop using the Google spreadsheets for Ref Gen. annotation tracking was generally accepted as worth further exploration. Compare relative dates of ISS annotations propagated from PAINT with most recent date for experimental annotation from GAFs.
Rex: how will status of ‘comprehensive’ annotation be captured with Pascale’s proposal?
Pascale: May not need concept of ‘comprehensive’..why not just go with ‘here is what could be propagated as of this date’....
Pascale: in PAINT, experimental annotations should be bolded from ANY source..not just Ref. genomes.