GO 18th Consortium Meeting Minutes Day 2

Please note that these meeting minutes are now being edited for production of a final version (02-11-2007).

Therefore please do not add any more information here but contact Val or Emily.

Monday 24th September 2007, Princeton University, NJ

Broad Agenda

Plans for immediate future (SL)
1. regulation
2. cross products
Database report
1. schema changes
2. production
GA files (MC)
OBO-Edit (JDR)
Evidence codes (MA)

Overview of cross products - Chris Mungall

David H: Everyone should look at table 3 in the wiki: http://gocwiki.geneontology.org/index.php/Regulation_cross-products People should comment on this so we can implement these cross-products. As soon as implemented then Chris will be able to run the reasoner.

Chris M: In future do we continue to run the reasoner periodically or should we put the adding of cross-products into the curation process?

David H: From the disjoint experience in the biological process ontology - we have problems if we don't get GO editors to put the information directly in, this is far better than going back and cleaning up these links. There are 6-8 ontology editors.

Chris M: Midori H will need to get this group together and teach them how to add in cross-product information, via Webex.

External Cross Products wiki:

http://gocwiki.geneontology.org/index.php/Ontology_Structure

Cross-product guide:

http://gocwiki.geneontology.org/index.php/Cross_Product_Guide

Judy B: A concern that using SO is exposing GO to risks, as SO changes over time, is it a problem for GO to be reliant on SO?

Chris M: not a problem, we can choose to use SO as it is, it does not force us to change anything as SO changes. In addition, adding in Cross-Product information into the ontology can be ignored if you don't care about the relationship with an external term, its fine but if you do add in cross-product information - the GO will be higher quality.

CP results in scratch directory. They will be evaluated and moved from scratch to one of the main ontology diretories

David H: how would these cross-product ontologies be edited?

Chris M: editors would in future need to load three files - gene_ontology_edit.obo, xxx, biological_process_xp_cell.obo and use the cross-product interface

Midori H: its good to always load these three files - as concerns about consequences of different levels of changes in each of the three files consensus: every editor should load all three files.

Chris M: Cannot merge the three files - as each at different levels of maturity, also the file would get every large. Will start with the regulates relationships and then the cell onology cross-products

ACTION: David Hill to organise Webex meeting to ensure all editors understand what they need to do when inputting cross-product information.

CM: wait until OBO-Edit 2.0?

John D-R: start the regulation work now? Start with the cell ontology once OBO-Edit 2.0 is ready?

Judy B, Midori H: Yes, this would give editors a change to start and get used to adding in this extra information and identify any issues that came up.

Discussion gene_ontology_edit.obo file vs original file confused.

John D-R. Add OBO version number to the filename. Then use original name as release version of file.

Michael A. Mike C. Change ‘edit’ in curators version to ‘pre_release’ to better describe it’s use. Original file updated nightly by Stanford.

Chris M. We need to take versioning a little more seriously – impossible to replicate analyses. How do we cite what version of the GO we use?

Michelle G-G: Do we hide pre release file?

Chris M: No, culture of using the latest file.

DB: not straightforward to find when people took data

wiki: http://gocwiki.geneontology.org/index.php/Vershionning_Proposal

SO and Chromosonal Location – Chris M and Karen E discuss offline.

Term Lifecycle - John Day-Richter

GET SLIDES FROM JDR

Term requests to instantiation in the ontology has been reported to be a bottleneck. Users request terms, then need to wait for implementation to use them.

Proposed solution. Give users a temporary ID to work with when they need a new term. Create mini ontology file they can update all their annotations with the new term id.

However many terms are rejected as requests are inappropriate, so need to feedback to them outcome of term request? How? Everyone has some way of dealing with term obsoletion – therefore we can use obsoletion mechanisms to feedback to user. When request closed, use ‘consider’ or ‘replace_by’ tags to get correct term. the term is obsoleted in the users private ontology.

Discussion:

Pascale G, Rama B: New groups might not be able to handle this. How many new groups are requesting terms and need them straight-away? Alex D: if new curators request a large number of inappropriate terms and then annotate to the temporary ids, this would end up irritating the curators. A peripheral group may not be able to handle this well.

Mike C: automation not a human friendly approach, this is not user support per se.

Midori H: how much more burden to track down terms to suggest and consider a replace_by?

John D-R: most term requests a lot of work, might be easier to phone the person and do the request on the spot.

Suzi L: but this is an extra to personal communication. Shouldn't shop a group from requesting new terms where there is a need for a fast turn-around, do not want to discourage new groups.

Michael A: How does this fit in the with SourceForge tracker, should we release this? Or keep as an option for groups submitting large numbers of requests that we trust.

John D-R: the obsolete temporary terms would die after 3 months in your mini ontology, therefore if the user did not carry out the required updates they would lose their temporary id as well - this creates a large incentive to properly implement obsoletion handling.

John D-R: Seth has already done this – but sounds like we shouldn’t release this publicly.

Judy B/John D-R: not much support for this – put on backburner until we find a good project for this to be used on. Possibly to use for Reactome requests, but Midori pointed out that ALL Reactome requests (ONLY 17), had been dealt with very promtly (12 within 2 days only one took 3 weeks to resolve).

ORB (Ontology Request Broker) - Seth Carbon

Seth has implemented a SourceForge request tracker wrapper. Will be available in AmiGO where users can use it to directly make a SourceForge request. Both HTML and programmatic interface for SourceForge, available for any tool.

Seth: DEMO of this wrapper in AmiGO

Users unable to find a term are given the option to go to a page which allows users to provide details and request a new term.
A form is provided to add term name, definition, additional details, and (optionally) a SF ID.This is submitted to the SF tracker, gives a success ID, term added into tracker. Users can choose to put in a user name or if not the request would go into a general id bucket. The user can then retrieve their terms with orb_default ids in OBO format.
Could not track submitters by IP address, as problems with firewalls and dhcp.

Could make a batch request - and an interface could be written so that an individual SourceForge ID would be returned for each request.

David H: want a way of identifying users, temporary ids are worrying. ED: within the AmiGO form, it should be essential for a submitter to add their e-mail address, would ensure they are involved in the term-request process and reduce number of lazy submits by users/spam. Michael A: Attribution section of the AmiGO form should be changed to 'E-mail Address', this should be required.

ACTION: Midori to work on the GO stanza specification required.

Michelle G-G: provide link to new term best practise documentation

John D-R: Use one batch tracker id?

Seth C: Perhaps generate SF ids using another system?

ACTION ITEM (David Midori Seth) Deploy the part that created SF items based on a friendly webform, and would like to see a obo format in the SF item.

ACTION ITEM Link to documentation on how to make a perfect GO term

Schema changes - Chris Mungall

SWUG:Database changes 2007

Support for multi species annotation files

(PAMGO to task releases files next month. Trudy to send examples to Chris to test this facility)

Support for new properties column. Test data from MGI received (they use structured notes field, others should also send examples)
Support in schema for taxon based queries, species, kingdom etc.
GOOSE new interface to MySQL DB. Aimed at intermediate to advanced users. EBI mirror>5000 hits so far.

GOOSE

SQL query interface for intermediate to advanced user.

http://www.berkeleybop.org/goose

Provides example SQL queries example: Stale ISS assignments

There is a wiki page of example SQL queries - can use to experiment and alter to needs - results in html or tab-delimited. GOOSE does not use the production database in Stanford, and a kill-limit is in place for problematic queries. Full version of the database is queried, including IEA data. Mirrors are updated daily, but annotation source updated once a month.

Q ST: web services?

A Chris M: Already a GO Perl API but will be providing web services and sparkle already ready

New architecture road map on AmiGO. More interactive components on front end.

wiki: http://gocwiki.geneontology.org/index.php/SWUG:AmiGO_Architecture_Roadmap

Seth and Amelia have been refactoring the server based code. Roadmap to transition AmiGO from Perl to Java. Re-use existing OBO-Edit code, mature and robust. Therefore saving development time in future.

Renovated GO database info page.

ACTION ITEM Amelia link GOOSE from front page - DONE

Gene Association Files - Mike Cherry

SGD wants to have 2 files – one manual, one IEA.

SGD would like to provide IEA annotation predictions. However concerns over ignorant users analysing GO data and using existing IEA annotations to create circular annotations. What should SGD call this file?

Chris M: need consistency if SGD do it - then we all should do it.

DB: want to make it clear to researchers that they should use correct GO annotations.

Michael A: propose that groups submit files as normal, but that a filter is installed for all GA files which will partition them into IEA and non-IEA files. On the annotations download website and ftp sites there will be too files.

Emily D: concern over profileration of files. DB: Need something that will tailor files to user requirements (taxon, NOT, non-IEA), advanced interface to do this on-the-fly

JD: help education of IEA. Important that GOC members who review papers are aware of this problem - place this information onto wiki. JH: this reviewing information should be made public, should be a page both for those writing and those reviewing papers. SR: information on evidence codes is hidden in the GOC site, should highlight location of this information on the GOC front page.

ACTION ITEM Mike Cherry write GA file filter script

ACTION ITEM Chris Mungall: More advanced interface to download custom files by versioning

OBO Edit Working Group - John Day-Richter

About to release 2.000 beta-14
89 bugs fixed.
OBO Edit toolkit now used in Phenote.
Reasoner much faster. Edit in real-time with reasoner on.

DEMO of OBO EDIT new features

Auto-complete
advanced searching for power users, Boolean querying
advanced sub query feature
docking panels to personalise interface.
Graph based editing updated automatically
Wrench icon for every panel to set up personal preferences, filtering, view options etc.
Create new terms and relationships in graph editor by drawing
Graph overview preview
Graph DAG Viewer
Spell checking

external contributions from..... CJM

File:OBOEditWorkingGroup GOC PU 2007.ppt

File:Term Requests GOC PU 2007.ppt

Discussion about availability of predictions - David Botstein

As more information about genes becomes available, the hope has been that this would in turn evolve high-throughput methods to find out more about genes and interactions and pathways and that this data would be determined and incorporated into GO. While this has happened to some extent, it has not been at the rate hoped for. There is no iterative process.

The ideal would be a list where curators would be promoted to check that validity of an annotation prediction. In reality, because of the vagueries of IEA data it has been difficult to validate data.

When people have looked for statistical links between genes to looks for possible associations, most predictions turned out to be good. They found evidence for this that wasn’t currently included in GO (i.e unannotated but information present in literature)

Would like the GOC to start using the suggestions coming from the community.

Judy B, this is a priority issue. MGI hase experience with Ken Pagan's group regarding mouse genome. Does take time to have an interactive relationship. There are genes which only have IEA annoations, this must be a priority set for annotation, they are key for curation and need to be prioritized.

David H: did this exercise with Fritz Roths dataset fell into 3 categories: correct annotation should clearly be made enough circumstantial stuff to make this annotation, but not tested from outer space

David H, Emily D: this exercise takes a lot of curation work Mike C: FRIZ, OLGA RC: have a grant to look at BioMediator - using expert GO annotation to validate predictions DB: could be something to build into tools.

??? Need more curators

Sue R: Users group to focus on predictions?

David Botstein: Some are one offs, others are systems which should be a semi automation. Does anything arise from the algorithm which isn’t obvious from reading the paper. Use the best of the methods routinely

??? Reports from people who have done these types of collaborations

Suzi L: something we build in to the long term. If GO becomes responsible for running SW/ limiting.

David H suggestion, run on reference genome gene of the month

Need to leverage the groups who are doing these things JH: Suggested Making a repository for predictions (POSSIBLE ACTION ITEM ???) Set up a placve where people can dump their results, and we will look at them.

TOUR OF LEWIS SIGLER INSTITUTE

Group Photo

LUNCH

Annotation Evidence Codes - Mike Cherry

Need to finalize the proposed evidence code documentation. There has been a huge number of e-mails on this subject.

Rama B: these is a draft web page ready, with the majority of text and examples agreed upon. However there are some sections in read, where input from the GO Consortium is needed.

Chris M: Assume that what is not in red is okay, and lets move on and get it onto the web site.

Pascale G: Evidence codes documentation is too long and complicated. Would it be possible to also come up with something simpler for users - and leave the long documentation for curators.

Chris M: evidence code documentation should be for curators, but users do read it as well. Eurie H: we should have an abbreviated version and move the detail onto the GOC wiki, as this level of detail swamps users. Suzi L: seconds Eurie's motion

ACTION: Rama to make changes to Evidence Code documentation

Decision tree pdf for evidence codes has appeared. No one has seen before - from Karen? Looks good. GOC needs to look through this.

Discussion: Revisiting the question “What is the purpose of evidence codes?”

How are evidence codes used by curators, biological users, informaticians

Users get an idea where the GO annotation came from

Val W: Curators can use to evaluate conflicting evidence from other species to make the best ISS inferences based on the available data. Might not want to make an ISS annotation based on IMP evidence.

Sue R: for functional inference, manual curation used as a gold standard for bench marking. For instance if they are making inferences based on expression, they should remove IEPs. When inferring from homology need to exclude annotations made from homology. Important to stop ISS and IEA annotations becoming circular. More confidence with varied evidences rather than one type of assay.

ER:Users are using evidence codes as a confidence level rating.

DB. Future of computational analysis depends more and more on the evidence codes and that some evidence is stronger than other.

David H: to quote documentation that there is no quality measure in evidence codes is patently false - all users use evidence codes in this respect.

Eurie H: old documentation said evidence codes can be used to “Evaluate the reliability of an annotations” this is the original intent of the evidence codes. There is a natural hierarchy.

Judy B. Experimental codes have been working Well. Debate mainly outside of the experimental evidence codes.

Michelle G-G: Many organism don’t have the literature to draw on , all meaningful annotation is sequence based methods. Many 99% of genes have no literature.

Rex C: RG goal use data based on experiments. Importance of evidence codes is paramount. Philsophical reason, provides a broad a base as we can with the groups that have experimental data for the groups that don’t

Michelle G-G: The vast majority of organims do not have literature resources to draw on. Therefore if we want annotation we need to draw on other methods. There is a vast prokarkyote population with no annotation.

There are important distinctions to be made in ISS. There is a huge spectrum of quality in ISS, i.e. if someone only takes the top blast hit - thats a low quality ISS. There are orthology based methods ISS, SnoRNA predictors, signalP, TMHMM, tRNA scan. Purely sequence analysis should be ISS

Mike C: for experimental annotations there is also the problem of a large different in the quality of published experimental investigations

DB: when talk about inference, you need to ask what is this data being inferred from? The source of that data is all important, need to make a trail from ISS statements to the source of that idea.

Suzi L: important to ensure that in the 'with' column we have information on what the ISS data originated from, as need traceability.

Judy B: Declaration of orthology for mammalian groups provides a basis. Concerned about extension of ISS

Rama B: RCA code was initiated by SGD for annotation from papers which use a combination of methods.

RC: For non-mammalian organisms, where there is not much experimental data, its important that a curator can see similarity without having to define it as an ortholog. Curators should be able to make a reasonable inference.

Ben H: Question comes down to which term, don’t need strict orthology to infer protein kinase.

Judy B: Yes, but the overriding theme is that we used one ontology applied by many different groups. These annotating groups must now agree standards - we don't want generic sequence similiarity. Ortholog is really important. When there is credible evidence that there is an ortholog this should have a separate ISS code. And if a gene has a functional annotation to transfer as there is a well known active site then this is ISS.

Michael A: If ortholog tables could be trusted, ortholog evidence code can be computed

Emily D: In GOA ortholog sets provide basis for an IEA prediction method. Curators can then manually infer functional equivalence by looking closer and assigning manual ISS annotations both for orthologs, paralogs etc.

Rex C: Orthologs are a more complex characterization of a sequence alignment Should be able to put a sequence in the with column. Sometimes ISS unclear, which is the ortholog? If you can put something in the with column. USE ISS otherwise RCA. If the method is computational, requires building of model, whole bunch of approaches, computational analisis

Sue R: TAIR uses granular children of main evidence codes. MC: against proliferation of evidence codes. Could have orthologs as IEA.

Judy B: orthologs not just based on domain structure, but whole protein.

Harold D: appears to be two flavours of ISS, but the distinction is what goes in the 'with' field. How much future away is the result of a tRNAscan than a measure of sequence similarity? It is misleading to have different flavours of the ISS code both for users and curators.

Mike C: we rejected many evidence codes, so there has to be 'collections' Emily D: isn't the finer grain detail provided by the GO reference collection?

Jim Hu, agreed with Rex, ISS based on orthology, overall partial/paralogs/families TMHMM fundamentally not evolutionary arguments

Val W: tRNAscan could be ISS or RNA, The tRNA and snoRNA predictors use multiple methods and could be RCA - as computational analysis by a combination of methods.

RL: ortholog data will change from time to time, IEA data will change.

RC: does not wanted to be restricted in ISS on the lack of evolutionary relationships. If you cannot put a sequence in the 'with' and the analysis is computational that requires a model involving a bunch of processes then we cannot put something in the 'with' as a sequence. This could be an good operational approach.

SR: Not adding more evidence codes is unrealistic. And adding evidence codes one at a time is primitive and its time people think about the whole picture.

Suzi L, John D-R: Agreed. Evidence code ontology should be looked at. This does not mean increased complexity ofr users, users can choose which evidence level they look at data.

ISA - overall sequence alignment (OSS) [ seq:ID] ISO - orthology data [seq:ID] ISM - model, hmm or SCFG [with is optional, only used when identifiers to model]

DB: but slimming will not help naive users.

RL: moving ahead, a working group needs to make a decision and sort out something which works.

EXP as a higher node

People can do this without changing the way they annotate Would allow people to download data with the relevant evidence codes

Settled on the following proposal:

ISS

      ISA requires sequence ID in with field
      ISO required sequence id in with field
      ISM

EXP (new grouping term for experimental evidence codes)

 IMP
  IGI
  IPI
 IDA
  IPI

RCA a more complicated method

Proposal ECWG to make new evidence code hierarchy. Implement richer number of evidence codes. Query communities about evidence codes.What would benefit them?

Michael A bequeathed the evidence code ontology to Sue Rhee

ACTION ITEM (Sue, Michelle, Rama) Put evidence code proposal in the context of what we discussed today

ACTION ITEM (Evidence code committee). ‘Separate’ documentation for users and curators.

ACTION ITEM (Evidence code committee). Revise evidence code documentation so that a mutation in only one gene can only be IMP (protein locatization IGI example)

ACTION ITEM (Curators) Check whether you have used IGI in this way and update annotations

ACTION ITEM "with" column optional for NAS to stay (as previously agreed)

ACTION ITEM (Evidence code committee) Only ND allowed for root nodes clarify documentation. Represents a status item

Summary of Action points from Day 2

(David Midori Seth) Deploy the part that created SF items based on a friendly webform, and would like to see an OBO format in the SF item (PENDING 1.6/1.7?)
Seth, ORB: Make link to how to make a perfect GO term from the term request tool PENDING (1.6/1.7?)
Amelia link GOOSE from front page (DONE)
DH: Cross products: need to have webex meeting to everyone understands what to do.
OBO file renaming. JB: add a link to Wiki: http://gocwiki.geneontology.org/index.php/Versioning_Proposal On the best practises page: http://gocwiki.geneontology.org/index.php/Best_Practises
Midori etc to work on specification needed for new Amigo features.
Gene Association files: to work on a more advanced interface to download custom files (Chris)
Gene Association files: to filter files as they come in. (Chris)
Judy: Predictive Activities. Collaborations with external groups. Reports into next GOC meeting as to these kinds of activities.
Jim: Suggested Making a repository for predictions POSSIBLE ACTION ITEM?
Finalizing proposed evidence code documentation – abbreviated version on web pages and more detailed on GOC Wiki (Rama)
Eurie: querying communities on awareness of evidence codes – do you know what it is, what do you use it for? Also proposal of expanding, then get a feel for what would benefit them? So that we have a large audience.
Sue, Michelle, Rama put evidence code proposal in the context of what we discussed today
Evidence code committee. Documentation for users and curators.
Evidence code Revise evidence code documentation so that a mutation in only one gene can only be IMP (protein localization IGI example)
(Curators) Check whether you have used IGI in this way and update annotations
(Curators) 'with' column optional for NAS - document
Update evidence code decision tree in response to today's discussion on evidence code usage (Jen and EV Code WG)
(Curators) only ND allowed to root nodes - clarify this in the documentation (Rama)
Karen E and Chris M will work on GO-SO cross products

[[Category:]]