GO 18th Consortium Meeting Minutes Day 2

From GO Wiki
Jump to navigation Jump to search

(Day 1 Minutes)

Monday 24th September 2007, Princeton University, NJ

Broad Agenda

  1. Plans for immediate future (SL)
    1. regulation
    2. cross products
  2. Database report
    1. schema changes
    2. production
  3. GA files (MC)
  4. OBO-Edit (JDR)
  5. Evidence codes (MA)

Overview of cross products by CJM

DH: everyone should look at table 3 in the wiki: http://gocwiki.geneontology.org/index.php/Regulation_cross-products

People should comment on this so we can implement these cross-products. As soon as implemented then Chris will be able to run the reasoner.

CM: in future do we continue to run the reasoner periodically or should we put the adding of cross-products into the curation process? DH: From the disjoint experience in the biological process ontology - we have problems if we don't get GO editors to put the information directly in, this is far better than going back and cleaning up these links.

There are 6-8 ontology editors. CM, MH will need to get this group together and teach them how to add in cross-product information, via Webex.

External Cross Products wiki:

http://gocwiki.geneontology.org/index.php/Ontology_Structure

Cross-product guide:

http://gocwiki.geneontology.org/index.php/Cross_Product_Guide


JB: A concern that using SO is exposing GO to risks, as SO changes over time, is it a problem for GO to be reliant on SO? CM: not a problem, we can choose to use SO as it is, it does not force us to change anything as SO changes. In addition, adding in Cross-Product information into the ontology can be ignored if you don't care about the relationship with an external term, its fine but if you do add in cross-product information - the GO will be higher quality.

CP results in scratch directory. They will be evaluated and moved from scratch to one of the main ontology diretories

DH: how would these cross-product ontologies be edited? CM: editors would in future need to load three files - gene_ontology_edit.obo, xxx, biological_process_xp_cell.obo and use the cross-product interface MH: its good to always load these three files - as concerns about consequences of different levels of changes in each of the three files consensus: every editor should load all three files. CM: cannot merge the three files - as each at different levels of maturity, also the file would get every large. CM: will start with the regulates relationships and then the cell onology cross-products

ACTION: David Hill to organise Webex meeting to ensure all editors understand what they need to do when inputting cross-product information. CM: wait until OBO-Edit 2.0? JDR: start the regulation work now? Start with the cell ontology once OBO-Edit 2.0 is ready? JB, MH: yes, this would give editors a change to start and get used to adding in this extra information and identify any issues that came up.


Discussion gene_ontology_edit.obo file vs original file confused.

JDR – add obo version number to the filename. Then use original name as release version of file.

MA/Mike – change ‘edit’ in curators version to ‘pre_release’ to better describe it’s use. Orig file updated nightly by Stanford.

CJM: need to take versioning a little more seriously – impossible to replicate analyses. How do we cite what version of the GO we use?

Michelle – do we hide pre release file?

Cjm: no, culture of using the latest file.

Michelle: However, orig file updated daily anyway.

DB: not straightforward to find when people took data

wiki: http://gocwiki.geneontology.org/index.php/Vershionning_Proposal


SO and Chromosonal Location – CJM and KE discuss offline.

John Day-Richter - Term Lifecycle

GET SLIDES FROM JDR

Term requests to instatiation in the ontology has been reported to be a bottleneck. Users request terms, then need to wait for implementation to use them.

Proposed solution. Give users a temporary ID to work with when they need a new term. Create mini ontology file they can update all their annotations with the new term id.

However many terms are rejected as requests are inappropriate, so need to feedback to them outcome of term request? How? Everyone has some way of dealing with term obsoletion – therefore we can use obsoletion mechanisms to feedback to user. When request closed, use ‘consider’ or ‘replace_by’ tags to get correct term. the term is obsoleted in the users private ontology.

Discussion:

PG, RB: new groups might not be able to handle this. How many new groups are requesting terms and need them straight-away? AD: if new curators request a large number of inappropriate terms and then annotate to the temporary ids, this would end up irritating the curators. A peripheral group may not be able to handle this well.

MC: automation not a human friendly approach, this is not user support per se.

MH: how much more burden to track down terms to suggest and consider a replace_by?

JD: most term requests a lot of work, might be easier to phone the person and do the request on the spot.

SL: but this is an extra to personal communication. Shouldn't shop a group from requesting new terms where there is a need for a fast turn-around, do not want to discourage new groups.

MA: How does this fit in the with SourceForge tracker, should we release this? Or keep as an option for groups submitting large numbers of requests that we trust.

JDR: the obsolete temporary terms would die after 3 months in your mini ontology, therefore if the user did not carry out the required updates they would lose their temporary id as well - this creates a large incentive to properly implement obsoletion handling.

JDR: Seth has already done this – but sounds like we shouldn’t release this publicly.

JB/JDR: not much support for this – put on backburner until we find a good project for this to be used on. Possibly to use for Reactome requests, but Midori pointed out that ALL Reactome requests (ONLY 17), had been dealt with very promtly (12 within 2 days only one took 3 weeks to resolve).

Seth ORB (Ontology Request Broker)

Seth has implemented a SourceForge request tracker wrapper. Will be available in AmiGO where users can use it to directly make a SourceForge request. Both HTML and programmatic interface for SourceForge, available for any tool.

Seth: DEMO of this wrapper in AmiGO

- users unable to find a term are redirected to a page which allows users to request details for a new term. Form is provided to add term name, definition, additional details and they get an SF ID. This is submit to SF tracker, gives a success ID, term added into tracker. Users can choose to put in a user name or if not the request would go into a general id bucket. The user can then retrieve their terms with orb_default ids in OBO format. - could not track submitters by IP address, as problems with firewalls. Could make a batch request - and an interface could be written so that an individual SourceForge ID would be returned for each request.

DH: want a way of identifying users, temporary ids are worrying. ED: within the AmiGO form, it should be essential for a submitter to add their e-mail address, would ensure they are involved in the term-request process and reduce number of lazy submits by users/spam. MA: attribution section of the AmiGO form should be changed to 'E-mail Address', this should be required.

ACTION: Midori to work on the GO stanza specification required.

Michelle: provide link to new term best practise documentation

JDR: Use one batch tracker id?

SC: generate SF ids using another system?


ACTION ITEM (David Midori Seth) Deploy the part that created SF items based on a friendly webform, and would like to see a obo format in the SF item.

ACTION ITEM Link to documentation on how to make a perfect GO term

Schema changes - Chris Mungall

SWUG:Database changes 2007

  • Support for multi species annotation files

(PAMGO to task releases files next month. Trudy to send examples to Chris to test this facility)

  • Support for new properties column. Test data from MGI received (they use structured notes field, others should also send examples)
  • Support in schema for taxon based queries, species, kingdom etc.
  • GOOSE new interface to MySQL DB. Aimed at intermediate to advanced users. EBI mirror>5000 hits so far.

GOOSE

SQL query interface for intermediate to advanced user.

http://www.berkeleybop.org/goose

Provides example SQL queries example: Stale ISS assignments

Is a wiki page of example SQL queries - can use to experiment and alter to needs - results in html or tab-delimited. Does not touch production database in Stanford, and a kill-limit is in place for problematic queries. Full version of the database is queried, including IEA data. Mirrors are updated daily, but annotation source updated once a month.


Q ST: web services? A CJM – already a GO Perl API but will be providing web services and sparkle already ready

New architecture road map on AmiGO. More interactive components on front end.

wiki: http://gocwiki.geneontology.org/index.php/SWUGAmiGO_Architecture_Roadmap

Seth and Amelia have been refactoring the server based code. Transitioning from Perl to AmiGO Java. Re-use existing OBO-Edit code, mature and robust. Therefore saving development time in future.

Renovated GO database info page.

ACTION ITEM Amelia link GOOSE from front page - DONE

Mike Cherry – Gene Association Files

SGD wants to have 2 files – one manual, one IEA.

SGD would like to provide IEA annotation predictions. However concerns over ignorant users analysing GO data and using existing IEA annotations to create circular annotations. What should SGD call this file?

CM: need consistency if SGD do it - then we all should do it.

DB: want to make it clear to researchers that they should use correct GO annotations.

MA: propose that groups submit files as normal, but that a filter is installed for all GA files which will partition them into IEA and non-IEA files. On the annotations download website and ftp sites there will be too files.

ED: concern over profileration of files. DB: Need something that will tailor files to user requirements (taxon, NOT, non-IEA), advanced interface to do this on-the-fly

JD: help education of IEA. Important that GOC members who review papers are aware of this problem - place this information onto wiki. JH: this reviewing information should be made public, should be a page both for those writing and those reviewing papers. SR: information on evidence codes is hidden in the GOC site, should highlight location of this information on the GOC front page.

ACTION ITEM Mike Cherry write GA file filter script

ACTION ITEM Chris Mungall: More advanced interface to download custom files by versioning

OBO Edit Working Group - JDR

About to rel 2.000 beta-14

89 bugs fixed.

OBO Edit toolkit now used in Phenote.

Reasoner much faster. Edit in real-time with reasoner on.

JOHN DEMO OBO EDIT NEW FEATURES

  • Auto-complete
  • advanced searching for power users, Boolean querying
  • advanced sub query feature
  • docking panels to personalise interface.
  • Graph based editing updated automatically
  • Wrench icon for every panel to set up personal preferences, filtering, view options etc.
  • Create new terms and relationships in graph editor by drawing
  • Graph overview preview
  • Graph DAG Viewer
  • Spell checking
  • external contributions from..... CJM

File:OBOEditWorkingGroup GOC PU 2007.ppt

File:Term Requests GOC PU 2007.ppt

David Botstein Discussion about availability of predictions

As more information about genes becomes available, the hope has been that this would in turn evolve high-throughput methods to find out more about genes and interactions and pathways and that this data would be determined and incorporated into GO. While this has happened to some extent, it has not been at the rate hoped for. There is no iterative process.

The ideal would be a list where curators would be promoted to check that validity of an annotation prediction. In reality, because of the vagueries of IEA data it has been difficult to validate data.

When people have looked for statistical links between genes to looks for possible associations, most predictions turned out to be good. They found evidence for this that wasn’t currently included in GO (i.e unannotated but information present in literature)

Would like the GOC to start using the suggestions coming from the community.

Judy, this is a priority issue. MGI hase experience with Ken Pagan's group regarding mouse genome. Does take time to have an interactive relationship. There are genes which only have IEA annoations, this must be a priority set for annotation, they are key for curation and need to be prioritized.

DH: did this exercise with Fritz Roths dataset fell into 3 categories: correct annotation should clearly be made enough circumstantial stuff to make this annotation, but not tested from outer space

DH, ED: this exercise takes a lot of curation work MC: FRIZ, OLGA RC: have a grant to look at BioMediator - using expert GO annotation to validate predictions DB: could be something to build into tools.

??? Need more curators

Sue Rhee: Users group to focus on predictions?

David Botstein: Some are one offs, others are systems which should be a semi automation. Does anything arise from the algorithm which isn’t obvious from reading the paper. Use the best of the methods routinely

??? Reports from people who have done these types of collaborations

Suzi: something we build in to the long term. If GO becomes responsible for running SW/ limiting.

DAVID suggestion, run on reference genome gene of the month

Need to leverage the groups who are doing these things JH: Suggested Making a repository for predictions (POSSIBLE ACTION ITEM ???) Set up a thing where people can dump their results, and we will look at them.


TOUR OF LEWIS SIGLER INSTITUTE

Group Photo

LUNCH

Annotation Evidence Codes - Mike Cherry

EVIDENCE CODES

Discussion: Pascal: Evidence codes documentation is too long and complicated

Decision tree pdf (add something about this, where is it?)

Revisiting the question “What is the purpose of evidence codes?” How are evidence codes used by curators, biological users, informaticians

Users get an idea where the GO annotation came from

Val: Curators can use to evaluate conflicting evidence form other species to make the best ISS inferences based on the available data

Sue Rhee: for functional inference, manual curation used as a gold standard for bench marking. For instance if they are making inferences based on expression, they should remove IEPs. When inferring from homology need to exclude annotations made from homology

???: More confidence with varied evidences rather than one type of assay.

Users are using evidence codes in an increasing way, but not as much as they should.

DB. Future depends more and more on the evidence codes.

Eurie: old documentation said to “Evaluate the reliability of an annotations”

Judy Experimental codes have been working Well. Debate mainly outside of the experimental evidence codes.

Michelle: Many organism don’t have the literature to draw on , all meaningful annotation is sequence based methods. Many 99% of genes have no literature.

Rex: RG goal use data based on experiments. Importance of evidence codes is paramount. Philsophical reason, provides a broad a base as we can with the groups that have experimental data for the groups that don’t

Michelle: Orthology based methods ISS, SnoRNA predictors, signalP, TMHMM, tRNA scan. Purely sequence analysis should be ISS

Judy: Decalaration of orthology, mammalian groups approach. Definion of ortholo provides the basis. This ALL they have done. Can be extended to hmms, clear methodology, was OK with that. Not OK with general extension to all other methodologies.

Rama: RCA, was to make annotation where combination of methods 2 hybrid/mass spec/ basian network analysis in combination with a probability value, statistical value

Ben: Question comes down to which term, don’t need strict orthology to infer protein kinase.

Michael: If ortholog tables could be trusted, ortholog evidence code can be computed

Could allow users to see only the ortholog subset from a table.

Rex: Orthologs are a more complex characterization of a sequence alignment Should be able to put a sequence in the with column. Sometimes ISS unclear, which is the ortholog. If you can put something in the with column. USE ISS otherwise RCA. If the method is computational, requires building of model, whole bunch of approaches, computational analisis

Jim, agreed with Rex, ISS based on orthology, overall partial/paralogs/families TMHMM fundamentally not evolutionary arguments


Sue moving ahead (seconded by Sue and Chris) There are going to be new evidence codes in the future Adding at a time should start thinking about in a more serious or robust way EXP as a higher node People can do this without changing the way they annotate Would allow people to download data with the relevant evidence codes

Settled on the following proposal

New proposed hierarchy

ISS

      ISA requires sequence ID in with field
      ISO required sequence id in with field
      ISM

EXP (new grouping term for experimental evidence codes)

 IMP
  IGI
  IPI
 IDA
  IPI

RCA a more complicated method

Proposal ECWG to make new evidence code hierarchy. Implement richer number of evidence codes. Query communities about evidence codes.What would benefit them?

MA bequeathed the evidence code ontology to Su Rhee

ACTION ITEM Sue, Michelle, Rama put evidence code proposal in the context of what we discussed today

ACTION ITEM Evidence code committee. ‘Separate’ documentation for users and curators.

ACTION ITEM Evidence code Revise evidence code documentation so that a mutation in only one gene can only be IMP (protein locatization IGI example)

ACTION ITEM (Curators) Check whether you have used IGI in this way and update annotations

ACTION ITEM with column optional for NAS

ACTION ITEM only ND allowed for root nodes clarify documentation. Represents a status item

Summary of Action points from Day 2

  1. (David Midori Seth) Deploy the part that created SF items based on a friendly webform, and would like to see an OBO format in the SF item.
  2. Seth, ORB: Make link to how to make a perfect GO term from the term request tool NO LONGER NEEDED?
  3. Amelia link GOOSE from front page
  4. DH: Cross products: need to have webex meeting to everyone understands what to do.
  5. OBO file renaming. JB: add a link to Wiki: http://gocwiki.geneontology.org/index.php/Versionning_Proposal On the best practises page: http://gocwiki.geneontology.org/index.php/Best_Practises
  6. Midori etc to work on specification needed for new Amigo features.
  7. Gene Association files: to work on a more advanced interface to download custom files (Chris)
  8. Gene Association files: to filter files as they come in. (Chris)
  9. Judy: Predictive Activities. Collaborations with external groups. Reports into next GOC meeting as to these kinds of activities.
  10. Jim: Suggested Making a repository for predictions POSSIBLE ACTION ITEM?
  11. Finalizing proposed evidence code documentation – abbreviated version on web pages and more detailed on GOC Wiki (Rama)
  12. Eurie: querying communities on awareness of evidence codes – do you know what it is, what do you use it for? Also proposal of expanding, then get a feel for what would benefit them? So that we have a large audience.
  13. Sue, Michelle, Rama put evidence code proposal in the context of what we discussed today
  14. Evidence code committee. Documentation for users and curators.
  15. Evidence code Revise evidence code documentation so that a mutation in only one gene can only be IMP (protein localization IGI example)
  16. (Curators) Check whether you have used IGI in this way and update annotations
  17. (Curators) 'with' column optional for NAS - document
  18. Update evidence code decision tree in response to today's discussion on evidence code usage (Jen and EV Code WG)
  19. (Curators) only ND allowed to root nodes - clarify this in the documentation (Rama)