GOC Meeting Minutes September 2009

From GO Wiki
Revision as of 05:49, 24 September 2009 by Dhowe (talk | contribs)

Jump to: navigation, search

Up to Cambridge_GO_Consortium_Meeting or Consortium_Meetings

Weds. AM

review grant status (Judy)

Aim I

  • 2yr project to fund CL development focusing on immunology and neurons funded!
  • Protein ontology development supplemental grant funded to represent complexes.

Discussion of PRO/GO and their relation to each other

  • (Judy)Pro very specific representation..mouse proteins in mouse complex...GO component is representing the class (ie not species specific).
  • each pro complex should have a xref to the GO complex.
  • complexes should be represented by their function, not their protein members
  • (David)some complexes defined by the proteins they containe, some by what the complex does.
  • (Judy)function of a complex may be cell type specific.
  • (Chris)GO should do complexes, PRO should do proteins which are then pointed to by GO components.
  • (Jud)this is not how the process was described in the PRO grant.
  • (Ben)complexes defined by their function may introduce a naming issue if the same complex has different functions in different places or cell types.
  • (Suzi)GO and PRO need to discuss how best to represent these complexes and relate PRO to GO.

ACTION: Primary stake holders in GO and PRO need to agree on how GO components and PRO will relate to each other

Aim 2

  • Funding obtained for protein set specification (panther group)
  • (Judy)Gene targets may want to be skewed towards genes known involved in Human disease. Important for grant and ongoing funding
  • (Rex)It is mportant to balance with genes having unknown function..emerging new information is important too..balance in choosing targets is needed.
  • (Ruth)There are human genes which we know a lot about that are not annotated. People want the data and are surprised not to find it.
  • (Judy)Increasing concern that we are sorely lacking in annotation depth in areas we know a lot about. Need more resources.
  • (Rex)How do we define relevance to Human biology.?
  • (Suzi)The original Reference Genomes proposal goes for annotation breadth AND depth. Need ways to prioritize targets, but this does not preclude inclusion of genes outside of a selected target gene set.
  • (Judy)We can look at systems, diseases, processes

ACTION: Annotation groups need to get ISS annotations from PAINT into their databases this year.

ACTION: Participating groups need to provide a file of their comprehensively annotated genes

Aim 3

Species to consider outreach to

  1. daphnia (Indiana)has GO annotations (Michael Lynch)
  2. Xenopus
  3. Sea Urchin
  4. plasmodium

Facilitation of annotation for new genomes

  • (Brenley)GONUTS can support annotation of any alternate species.
  • (Rama)We need an annotation camp. It’s been several years.
  • We need more ongoing training
  • (Judy/Paul)PAINT needs a mechanism of annotation for new genomes, have GO jamboree, bring these new groups in.
  • (Emily)Electronic annotation pipelines lack development and improvement efforts which could help new groups.
  • (Emily)More effort needed in expanding electronic curation efforts. More sources of electronic annotation are out there.
  • (David)Automated annotations are too general, result in clustering at high level (like developmental process) which is not that valuable for new groups.
  • (Judy)Need continuing discussion here to focus on utility of GO. What do users want/need? How do we focus our effort here for the next year? Need to address to get continued funding.
  • (Rex)This was 'THE REASON' for the Reference Genomes Project. How can it’s role be expanded? Why isn’t the project working for non-reference genenomes species?
  • (Micheal)EMO-emerging model organisms
  • (Paul)82 species have been chosen to be close to as many species as possible for ortholog prediction and functional annotation to related species.
  • SP/UNIPROT group are increasing manual annotation efforts primarily focused on Human proteins
  • (Kara)Consider outreach to ourselves as a priority for the next year. It’s not clear there is “MOD buy in”. How is our own work being incorporated into our OWN databases? Peter Good wants to know “how does this work advance biomedical hypothesis generation?”
  • (David)How do we get non-NIH funding from sources interested in general biology?
  • Perhaps there is a mis-perception of the state of things. PAINT annotations are still under development so not yet fully spreading the resulting ISS annotations to other species...that is coming.
  • (Susan)Should we highlight Reference Genomes annotations? Groups with one annotator get behind in curation, this should not be confused with lack of buy in or desire to participate

ACTION: MODs-make an effort to highlight Ref Genome annotation

  • (David)Data in Reference Genomes paper highlights progress and effect of ref genome effort on annotations.
  • (Mike)do MODs announce or advertise their participation in Reference Genomes project?
  • (Ruth)Following up with ISS annotations is difficult.
  • (Judy)We are poised to make the next step of propagating annotations. This should highlight the annotations generated from the Reference Genomes project.
  • (Rex)For grant progress report and Reference Genomes website, we should provide metrics like the number of Reference Genome annotations and the percent of databases that have those annotations propagated.
  • Real value will become apparent from propagation of annotations to other species
  • (Judy)If we can provide a set of annotations for one of the non-Reference Genomes species in the next year this will be good demonstration of the power of the effort. This may be a good focus point for the next year.

Aim 4

  • (David)It is easy but not obvious how to get a version of the GO. Needs to be easier if we are asking users to do this.
  • (Midori)Users are not asking for how to find ontology version numbers
  • (David) I see requests for how to find out what version was used in past studies

Community Interactions and Outreach

ACTION:Val will make available the form used to elicit responses from authors

  • (val)Estimates 1/3 the time to complete the curation of papers with authors filling in this form. Form sent to last author. Results are processed by Val. Results generate questions etc..so much work follows on responses, but does get info direct from authors.
  • not traced to who did the annotation though.
  • facilitates user understanding of the existing annotations.
  • TAIRs Plant Phys submission form can be seen here: http://www.aspb.org/publications/tairsubmission.cfm
  • (Tanya)TAIR Plant Phys submission form was improved with autocomplete to help submitters pick GO terms. Also started similar process with the journal Plant Journal using a template spreadsheet. PJ handles this as a supplemental data submission included with the paper when published.
  • (Jane)Recommendations from the GO review
  • (Judy)What is a function, what is a process? We should define these..what do we mean by these in GO? We need to be clear about what we are doing.

Web Stats (Seth)

  • Seth can set up access to Google Analytics if you send him your gmail account. (File:New-analytics.pdf)
  • (Mike) Machines were moved at Stanford, disrupting Google Analytics reporting, but usage stats have been very constant over past 12-24 months..what does that mean?
  • (Jane) GO News site(http://go.berkeleybop.org/news4go/), twitter, and RSS feed are available off gene ontology web page.

AmiGO status (Chris)

  • (Chris)AmiGO status-IEAs included, but not GOA files...species excluded consequently, but IEAs for MODs included.
  • (Judy)What are the questions that wet bench biologists want to ask of GO...and are we making that possible for them?
  • (Ruth)Labs use Inginuity because Inginuity have a large number of annotators and users get more complete results. They use their own interface, GO+their own ontologies, own annotations. How many groups pay for access to inginuity? Is that money well spent?
  • (David)Examine inginuity..why are users using it? What can we learn from that?
  • (Jen/Varsha)Incorrect annotation is offputting, so is incomplete annotation
  • (David)Users may perceive an annotation as incorrect, but often that is not the case.
  • What is the issue at hand?
  1. interface (ala inginuity..inginuity provides a distinct service than GO)
  2. do we need to distinguish GO from inginuity, when to use which, for users?
  3. lack of completeness of annotation or ontology?
  4. incorrect annotation or ontology structure?
  5. user perception or understanding of the GO? (dopamine is not always a neurotransmitter for example)
  6. why are biologists not using GO? If not using GO what alternative is being offered?
  • Users/developers of third party GO tools may not keep their ontologies up to date which makes GO seem less useful.
  • (Mike)We need to be creative about what GO provides.
  • (Judy)We might want to lower the visibility of GO Tools and increase the visibility of using GO from the GO site and point from MODS to GO site.
  • (Paul)If we are too far removed from end users it is hard to value our contribution and hard to get feedback from users directly
  • (Tanya)Target surveys to MOD users list, but also meeting registrants who may or may not be MOD users. Ask if they use GO? Provide a web page for a survey so all survey respondants can fill out the same form no matter which meeting they are at?

ACTION: Build a survey for grant update (Val, Jane, Jen), determine where to send it. (Mike has subscription to SurveyMonkey)

  • (Midori)EBI targets biologists...ask them how they do it!
  • (Eva)Send reminder emails to survey recipients to try to reduce self selection of survey responders.

ACTION: provide two survey urls and compare responses from a targeted list like the submitters to GO help list with responses from a random list of biologists

LUNCH

Reference Genomes Progress Report (Pascale)

  • (David)Do we have a gold standard set of genes we use to test annotation metrics? Not currently..but we should find a set for this purpose. This will standardize testing of annotation progress.
  • (Brenley)Should GONUTS use UniProt IDs? -Yes
  • (Brenley)Should GONUTS use UniProt protein names? maybe..but may not be that useful
  • (Judy)There is a proposal for Human/Mouse/Rat isofom nomenclature
  • (Chris)in GAF2.0 annotations to genes, col. 17 will contain specific isoform reference, typically UniProt.

PAINT demo(Suzi)

  • need java 1.5 to support drag-and-drop function critical to PAINT annotation process.
  • PAINT uses the GOlite DB, updated ~weekly
  • Update GAF files no less frequently than once per month!
  • Transferring annotations only can happen when manual annotation is complete by all groups for a given gene (QUESTION:How does this get reconciled with the proposal put forth by Pascale later in the day to do away with Google spreadsheet?)

ACTION: Locate code for making font size changes in PAINT(Paul)

  • Name column in PAINT table view should say “symbol” perhaps?
  • Uniprot IDs should be indicated as ‘Reviewed” or “unreviewed” (UniProt vs. Trembl)?
  • (Judy)Make lines in table for genes that have manual annotation more prominent somehow..bold is not visible enough, Michael assures it is plain as day on the computer screen...
  • (David)Would like to see filter/highlight by species to examine annotation of paralogs
  • (David)We should be sure to have a common mechanism for feeding GAF files back to MODs, could be used for PAINT annotations, GONUTS GAF dumps, and inferred annotations from Function/process cross references.
  • Positive annotations should not be propagated to clades that have NOT qualified annotations to related terms..Suzi assured us that won't happen.
  • As PAINT tree annotation proceeds, propagation of annotations that can be automated will be. Helps to streamline PAINT curation process.
  • (Pascale)GAF files generated from PAINT are already available from wiki and Ref Genome page for MODs to load into their own data.
  • (Paul)Propagation of annotations depends upon our ability to interpret annotations and the context in which that annotation was made. *Some of those details lay in col. 16 like process X occurs in cell CL:#### at stage Y.
  • (Suzi)If PAINT illustrates problems with protein/gene records (example: gene merge needed), how can that be fed back to sources to get the problem adjusted?
  • (Paul)Consider infrastructure to make more conjunctive statements-function X occurs in component A of cell Y for example.
  • Consider choosing target genes based on functional systems
  • (Rex/Judy)All time taken away from literature curation takes away some experimental annotations needed to support inferential annotations.
  • (Paul)The PAINT tree curation process is not highly scalable. Re-reading of papers is sometimes needed to facilitate proper tree curation...can that be mitigated in some way?
  • (Judy)Efficiency of tree curation is important, propagated annotations will be examined by MOD curators before they are added to our databases. This will then address some of the complexity that Paul was bringing up.
  • (Rex)bottom line...we (all of us) don’t have enough funding to curate all the necessary literature.
  • (Judy)We already set the bar lower from “complete” annotation to “comprehensive” to help address the volume issues.
  • (Judy)We have a lot of genes named that users reasonably expect to be annotated. Proper experiments to demonstrate the functions won’t ever be done in mouse for example if they have been done in cerivesia. These annotations are needing to be transferred..users expect them to be there..basic functionality like spliceosome.

Proposed method to get away from Google Spreadsheets

  • Pascal’s proposed method to stop using the Google spreadsheets for Reference Genomes annotation tracking was generally accepted as worth further exploration.
  • Compare relative dates of PAINT tree annotation with the most recent date for experimental annotation from all GAFs.
  • (Rex)How will status of ‘comprehensive’ annotation be captured with Pascale’s proposal? Earlier it was stated that 'comprehensive' annotation status is needed before PAINT curation can occur.
  • (Pascale)May not need concept of ‘comprehensive’..why not just go with ‘here is what could be propagated as of this date’, and then keep track of which genes need new PAINT tree curation due to newly added experimental annotation?
  • (Pascale)In PAINT, the presence of experimental annotations should be emphasized from ANY source..not just Ref. genomes.

THURSDAY

  • Review Weds. Minutes-results of discussion logged in wiki minutes

Infrastructure (Mike/Chris)

ACTION: OBO format 1.3 tags (creation date, etc.) will be added to gene ontology ext

  • Everyone agreed to this

ACTION: OBO version number added to all OBO files

  • Everyone agreed to this

GAF publishing pipelines

  • GAF 2.0 files currently get converted to GAF1.0 during filtering process, 2.0 format file is in submissions directory
  • WHEN should switch be made to pure GAF 2.0 format? Need sufficient lead time.

ACTION: GAF files from PAINT will need to be GAF 2.0 format

ACTION: Switch to GAF 2.0 in publishing pipeline with substantially long (3 Months) public notice to this change.

ACTION: add notices like this change to GAF 2.0 to GO News Feeds

Unannotated genes and GAFs

  • The problem: we want all genes represented in AmiGO
  • Currently not happening b/c GAF files only contain annotated genes
  • Uncertainty still exists regarding proper format for gp2protein file

ACTION: post a clear spec of the desired format for gp2protein files for all MODs-include 1 row for every gene? 1 protein ID per row? No protein ID is OK if no protein is available?

ACTION: keep GAF file as is. Provide a new file (gpfile?) to describe gene products. Provide a detailed spec of the contents of this new file. This file may subsume gp2protein

Function->Process inferred annotations by IC

ACTION: MODS please look at the F->P IC annotations proposed for your species in http://geneontology.org/scratch/gaf-inference. If you have issues get back to Chris in the next 2 weeks.

ACTION: MODS will load these F->P IC annotations to their database

ACTION: Taxon constraint checks, report should be sent back to group but not filtered out..still load.

ACTION: chris check that interontology inferred annotations limited to inference from experimental annotations only.


Changes to GO Database management practices

ACTION: use a more relaxed schedule for building GO Full (quarterly perhaps?)

ACTION: reduce load of go lite from 3 to 1 time per week.

  • Both of these actions agreed to by all.

XP term request template

  • Automation of term requests for regulates terms
  • Curator could use ID immediately
  • If term is not allowed by ontology editors, it could be obsoleted
  • Siimilar mechanism for cellular component cross products
  • If is_a parent can’t be found for some reason, don’t allow submission
  • Should probably be behind a login or security mechanism to know who did the submission
  • This mechanism will hide new term requests from other annotators which is undesirable-find a way to alert everyone to new term requests.
  • SF requests provide a history for new term requests which may be desirable for these as well.
  • Curators like to be able to determine how many term requests were made..can do this through SF. Would like to retain that functionality.

THURSDAY

  • Review Weds. Minutes-results of discussion logged in wiki minutes

Infrastructure (Mike/Chris)

ACTION: OBO format 1.3 tags (creation date, etc.) will be added to gene ontology ext

  • Everyone agreed to this

ACTION: OBO version number added to all OBO files

  • Everyone agreed to this

GAF publishing pipelines

  • GAF 2.0 files currently get converted to GAF1.0 during filtering process, 2.0 format file is in submissions directory
  • WHEN should switch be made to pure GAF 2.0 format? Need sufficient lead time.

ACTION: GAF files from PAINT will need to be 2.0 ACTION: Switch to GAF 2.0 in publishing pipeline with substantially long (3 Months) public notice to this change. ACTION: add this notices like this change to GAF2.0 to GO News Feeds

Unannotated genes and GAFs

  • The problem: we want all genes represented in AmiGO
  • Currently not happening b/c GAF files only contain annotated genes
  • Uncertainty still exists regarding proper format for gp2protein file

ACTION: post a clear spec of the desired format for gp2protein files for all MODs-include 1 row for every gene? 1 protein ID per row? No protein ID is OK if no protein is available? ACTION: keep GAF file as is. Provide a new file (gpfile?) to describe gene products. Provide a detailed spec of the contents of this new file. This file may subsume gp2protein

Function->Process inferred annotations by IC

ACTION: MODS please look at the F->P IC annotations proposed for your species in http://geneontology.org/scratch/gaf-inference. If you have issues get back to Chris in the next 2 weeks.

ACTION: MODS will load these F->P IC annotations to their database

ACTION: Taxon constraint checks, report should be sent back to group but not filtered out..still load.

ACTION: chris check that interontology inferred annotations limited to inference from experimental annotations only.


Changes to GO Database management practices

ACTION: use a more relaxed schedule for building GO Full (quarterly perhaps?)

ACTION: implement a version like shown for curators to try before next GO meeting

Ontology Quality Control Reports(David)

  • Now using OBOEdit to provide implied links via regulates.
  • Many QC reports have greatly improved ontology quality

Cross-Products in GO (Midori)

  • proposal to move corss-product definitions from multiple files into gene_ontology_write.ob
  • start with Regulates, then BPxBP using part_of, then CCxCC, CCxMF or BP
  • New relations needed to implement CCxBP and CCxMF cross products
  • Are there any consequences for software rendering graphs?

Function-Process Links

  • BP->MF now has regulates relationship
  • MF->BP starting to get part_of relationship
  • Barry concerned about use of part_of between F and P
  • Another relation that allows propagation of annotations across the ontologies would be good
  • Function ontology represents activities, which can be related to processes since they are both occurrants.
  • Use systematic names for these terms
  • (Barry)consider renaming Molecular Function to Molecular Activity
  • Often enzyme assay shows function, mutant shows process..but the two are not combined in the experimental results. Curators will need to be careful of annotating with P->F compound terms.

Alignment of GO with Pathways Databases (David)

  • pilot to take Reactome pathway and coordinate MF in GO with catalytic activities in Reactome as test case.

Annotation Relationships (Chris/Jane)

  • There is an implicit relation between an annotation and the gene to which it applies
  • explicit relation between annotation and gene would be better
  • relations may be like ‘extrinic to’ or ‘acted on during’ to relate annotation to gene
  • host-symbiont processes have more than one participant
  • this is represented currently in ontology structure via host processes and symbiont processes
  • Chris suggested to have a relation between the gene product and the annotation
  • This is a long term solution, currently seeking comments

ACTION: Group agreed this makes some sense. Work up a more concrete specification of how this would work.

UniPathway (Anne Morgat)

  • linking from UniPathway to GO via EC won’t be good long term solution. EC not represented 1EC->1GO.
  • gene-specific GO terms were recently removed, this project will re-instate such terms.

ACTION:Pascale and Anne and Harold and Chris and Peter D’Eustachio work together to provide cross products b/t MFunction and Chebi to provide and report back to the group in the Spring meeting



Up to Cambridge_GO_Consortium_Meeting or Consortium Meetings