GO 18th Consortium Meeting Minutes Day 2: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 17: Line 17:
== Overview of cross products by CJM ==
== Overview of cross products by CJM ==


DH: everyone should look at table 3 in the wiki:
http://gocwiki.geneontology.org/index.php/Regulation_cross-products


CP results in scratch directory.
People should comment on this so we can implement these cross-products. As soon as implemented then Chris will be able to run the reasoner.
 
CM: in future do we continue to run the reasoner periodically or should we put the adding of cross-products into the curation process?
DH: From the disjoint experience in the biological process ontology - we have problems if we don't get GO editors to put the information directly in, this is far better than going back and cleaning up these links.
 
There are 6-8 ontology editors.
CM, MH will need to get this group together and teach them how to add in cross-product information, via Webex.
 
External Cross Products wiki:
 
http://gocwiki.geneontology.org/index.php/Ontology_Structure
 
Cross-product guide:
 
http://gocwiki.geneontology.org/index.php/Cross_Product_Guide
 
 
JB: A concern that using SO is exposing GO to risks, as SO changes over time, is it a problem for GO to be reliant on SO?
CM: not a problem, we can choose to use SO as it is, it does not force us to change anything as SO changes.
In addition, adding in Cross-Product information into the ontology can be ignored if you don't care about the relationship with an external term, its fine but if you do add in cross-product information - the GO will be higher quality.
 
CP results in scratch directory. They will be evaluated and moved from scratch to one of the main ontology diretories
 
DH: how would these cross-product ontologies be edited?
CM: editors would in future need to load three files - gene_ontology_edit.obo, xxx, biological_process_xp_cell.obo
and use the cross-product interface
MH: its good to always load these three files - as concerns about consequences of different levels of changes in each of the three files
consensus: every editor should load all three files.
CM: cannot merge the three files - as each at different levels of maturity, also the file would get every large.
CM: will start with the regulates relationships and then the cell onology cross-products
 
ACTION: David Hill to organise Webex meeting to ensure all editors understand what they need to do when inputting cross-product information.
CM: wait until OBO-Edit 2.0?
JDR: start the regulation work now? Start with the cell ontology once OBO-Edit 2.0 is ready?
JB, MH: yes, this would give editors a change to start and get used to adding in this extra information and identify any issues that came up.  




Discussion gene_ontology_edit.obo file vs original file confused.
Discussion gene_ontology_edit.obo file vs original file confused.

Revision as of 08:25, 8 October 2007

(Day 1 Minutes)

Monday 24th September 2007, Princeton University, NJ

Broad Agenda

  1. Plans for immediate future (SL)
    1. regulation
    2. cross products
  2. Database report
    1. schema changes
    2. production
  3. GA files (MC)
  4. OBO-Edit (JDR)
  5. Evidence codes (MA)

Overview of cross products by CJM

DH: everyone should look at table 3 in the wiki: http://gocwiki.geneontology.org/index.php/Regulation_cross-products

People should comment on this so we can implement these cross-products. As soon as implemented then Chris will be able to run the reasoner.

CM: in future do we continue to run the reasoner periodically or should we put the adding of cross-products into the curation process? DH: From the disjoint experience in the biological process ontology - we have problems if we don't get GO editors to put the information directly in, this is far better than going back and cleaning up these links.

There are 6-8 ontology editors. CM, MH will need to get this group together and teach them how to add in cross-product information, via Webex.

External Cross Products wiki:

http://gocwiki.geneontology.org/index.php/Ontology_Structure

Cross-product guide:

http://gocwiki.geneontology.org/index.php/Cross_Product_Guide


JB: A concern that using SO is exposing GO to risks, as SO changes over time, is it a problem for GO to be reliant on SO? CM: not a problem, we can choose to use SO as it is, it does not force us to change anything as SO changes. In addition, adding in Cross-Product information into the ontology can be ignored if you don't care about the relationship with an external term, its fine but if you do add in cross-product information - the GO will be higher quality.

CP results in scratch directory. They will be evaluated and moved from scratch to one of the main ontology diretories

DH: how would these cross-product ontologies be edited? CM: editors would in future need to load three files - gene_ontology_edit.obo, xxx, biological_process_xp_cell.obo and use the cross-product interface MH: its good to always load these three files - as concerns about consequences of different levels of changes in each of the three files consensus: every editor should load all three files. CM: cannot merge the three files - as each at different levels of maturity, also the file would get every large. CM: will start with the regulates relationships and then the cell onology cross-products

ACTION: David Hill to organise Webex meeting to ensure all editors understand what they need to do when inputting cross-product information. CM: wait until OBO-Edit 2.0? JDR: start the regulation work now? Start with the cell ontology once OBO-Edit 2.0 is ready? JB, MH: yes, this would give editors a change to start and get used to adding in this extra information and identify any issues that came up.


Discussion gene_ontology_edit.obo file vs original file confused.

JDR – add obo version number to the filename. Then use original name as release version of file.

MA/Mike – change ‘edit’ in curators version to ‘pre_release’ to better describe it’s use. Orig file updated nightly by Stanford.

CJM: need to take versioning a little more seriously – impossible to replicate analyses. How do we cite what version of the GO we use?

Michelle – do we hide pre release file?

Cjm: no, culture of using the latest file.

Michelle: However, orig file updated daily anyway.

DB: not straightforward to find when people took data


SO and Chromosonal Location – CJM and KE discuss offline.

John Day-Richter - Term Lifecycle

GET SLIDES FROM JDR

Term requests to instatiation int he ontology has been reported to be a bottleneck. Users request terms, then need to wait for implementation to use them.

Proposed solution. Give users a temporary ID to work with when they need a new term. Create mini ontology file they can update all their annotations with the new term id.

However many terms are rejected as requests are inappropriate, so need to feedback to them outcome of term request? How? Everyone has some way of dealing with term obsoletion – therefore we can use obsoletion mechanisms to feedback to user. When request closed, use ‘consider’ or ‘replace_by’ tags to get correct term. the term is obsoleted in the users private ontology.

Discussion:

PG: new groups might not be able to handle this.

MC: automation not a human friendly approach, this is not user support per se.

MH: how much more burden to track down terms to suggest and consider a replace_by?

JD: most term requests a lot of work, might be easier to phone the person and do the request on the spot.

SL: but this is an extra to personal communication.

JDR: Seth has already done this – but sounds like we shouldn’t release this publicly.

JB/JDR: not much support for this – put on backburner until we find a good project for this to be used on. Possibly to use for Reactome requests, but Midori pointed out that ALL Reactome requests (ONLY 15), had been dealt with very promtly (12 within 2 days and all within 3 weeks).

Seth ORB (Ontology Request Broker)

Will be linked to AmiGO. If a user searhces for a term in AmiGO but gets no results follow link to ‘add new term’. Form is provided to add term name, definition, additional details and they get an SF ID. The user can then retrieve their terms with orb_default ids in OBO format.

SourceForge username becomes part of the ID to help tracking.

Discussion follows:

DH: nice that users need to get SF login. Don’t like temporary IDs though.

MA: produces stanza

ED: how do you handle spam? Need to enter email address?

MA: yes contact should be requirement

Michelle: provide link to new term best practise documentation

JDR: Use one batch tracker id?

SC: generate SF ids using another system?


ACTION ITEM (David Midori Seth) Deploy the part that created SF items based on a friendly webform, and would like to see a obo format in the SF item.

ACTION ITEM Link to how to make a perfect GO term

Schema changes - Chris Mungall

SWUG:Database changes 2007

  • Support for multi species annotation files
  • Support for new properties column. Test data from MGI received (they use structured notes field)
  • Support in schema for taxon based queries, species, kingdom etc.
  • GOOSE new interface to MySQL DB. Aimed at intermediate to advanced users. EBI mirror>5000 hits so far.

GOOSE

SQL query interface for intermediate to advanced user.

http://www.berkeleybop.org/goose

Provides example SQL queries example: Stale ISS assignments


Q ST: web services? A CJM – yes, sparkle already ready

New architecture road map on Amigo. More interactive components on front end.

Seth and Amelia have been refactoring the server based code. Transitioning from Perl to Java. Re-use existing OBO-Edit code, mature and robust. Therefore saving development time in future.

Renovated GO database info page.

ACTION ITEM Amelia link GOOSE from front page - DONE

Mike Cherry – Gene Association Files

SGD wants to have 2 files – one manual, one IEA.

Discussion follows….

CM: need consistency if SGD do it - then we all should do it.

DB: want it clear to researchers that they are using correct experimental data.

MA: propose we do this at download time for the user.

JD: help education of IEA.

ACTION ITEM Write GA file filter script ???

ACTION ITEM More advanced interface to download custom files by versioning

OBO Edit Working Group - JDR

About to rel 2.000 beta-14

89 bugs fixed.

OBO Edit toolkit now used in Phenote.

Reasoner much faster. Edit in real-time with reasoner on.

JOHN DEMO OBO EDIT NEW FEATURES

  • Auto-complete
  • advanced searching for power users, Boolean querying
  • advanced sub query feature
  • docking panels to personalise interface.
  • Graph based editing updated automatically
  • Wrench icon for every panel to set up personal preferences, filtering, view options etc.
  • Create new terms and relationships in graph editor by drawing
  • Graph overview preview
  • Graph DAG Viewer
  • Spell checking
  • external contributions from..... CJM

File:OBOEditWorkingGroup GOC PU 2007.ppt

File:Term Requests GOC PU 2007.ppt

David Botstein Discussion about availability of predictions

There is lots of masturbatory literature of researchers stimulating themselves with the same information GO open to abuse There should be a more rapid iterative process to make predicted annotations available, especially to curators who would be prompted to look for the possibility that a biological process could occur.

When people have looked for statistical links between genes to looks for possible associations, most predictions turned out to be good. They found evidence for this that wasn’t currently included in GO (i.e unannotated but information present in literature)

Judy, this is a priority issue

??? Need more curators

David did this exercise with Fritz Roths dataset fell into 3 categories:

correct annotation should clearly be made

enough circumstantial stuff to make this annotation, but not tested

from outer space

Sue Rhee: Users group to focus on predictions? Mike: Build into future David Botstein Some are one offs, others are systems which should be a semi automation. Does anything arise from the algorithm which isn’t obvious from reading the paper. Use the best of the methods routinely

Rex: Grant using expert GO annotation to validate predictions

??? Reports from people who have done these types of collaborations

Suzi: something we build in to the long term. If GO becomes responsible for running SW/ limiting.

DAVID suggestion, run on reference genome gene of the month

Need to leverage the groups who are doing these things Jim: Suggested Making a repository for predictions (POSSIBLE ACTION ITEM ???) Set up a thing where people can dump their results, and we will look at them.


TOUR OF LEWIS SIGLER INSTITUTE

Group Photo

LUNCH

Annotation Evidence Codes - Mike Cherry

EVIDENCE CODES

Discussion: Pascal: Evidence codes documentation is too long and complicated

Decision tree pdf (add something about this, where is it?)

Revisiting the question “What is the purpose of evidence codes?” How are evidence codes used by curators, biological users, informaticians

Users get an idea where the GO annotation came from

Val: Curators can use to evaluate conflicting evidence form other species to make the best ISS inferences based on the available data

Sue Rhee: for functional inference, manual curation used as a gold standard for bench marking. For instance if they are making inferences based on expression, they should remove IEPs. When inferring from homology need to exclude annotations made from homology

???: More confidence with varied evidences rather than one type of assay.

Users are using evidence codes in an increasing way, but not as much as they should.

DB. Future depends more and more on the evidence codes.

Eurie: old documentation said to “Evaluate the reliability of an annotations”

Judy Experimental codes have been working Well. Debate mainly outside of the experimental evidence codes.

Michelle: Many organism don’t have the literature to draw on , all meaningful annotation is sequence based methods. Many 99% of genes have no literature.

Rex: RG goal use data based on experiments. Importance of evidence codes is paramount. Philsophical reason, provides a broad a base as we can with the groups that have experimental data for the groups that don’t

Michelle: Orthology based methods ISS, SnoRNA predictors, signalP, TMHMM, tRNA scan. Purely sequence analysis should be ISS

Judy: Decalaration of orthology, mammalian groups approach. Definion of ortholo provides the basis. This ALL they have done. Can be extended to hmms, clear methodology, was OK with that. Not OK with general extension to all other methodologies.

Rama: RCA, was to make annotation where combination of methods 2 hybrid/mass spec/ basian network analysis in combination with a probability value, statistical value

Ben: Question comes down to which term, don’t need strict orthology to infer protein kinase.

Michael: If ortholog tables could be trusted, ortholog evidence code can be computed

Could allow users to see only the ortholog subset from a table.

Rex: Orthologs are a more complex characterization of a sequence alignment Should be able to put a sequence in the with column. Sometimes ISS unclear, which is the ortholog. If you can put something in the with column. USE ISS otherwise RCA. If the method is computational, requires building of model, whole bunch of approaches, computational analisis

Jim, agreed with Rex, ISS based on orthology, overall partial/paralogs/families TMHMM fundamentally not evolutionary arguments


Sue moving ahead (seconded by Sue and Chris) There are going to be new evidence codes in the future Adding at a time should start thinking about in a more serious or robust way EXP as a higher node People can do this without changing the way they annotate Would allow people to download data with the relevant evidence codes

Settled on the following proposal

New proposed hierarchy

ISS

      ISA requires sequence ID in with field
      ISO required sequence id in with field
      ISM

EXP (new grouping term for experimental evidence codes)

 IMP
  IGI
  IPI
 IDA
  IPI

RCA a more complicated method

Proposal ECWG to make new evidence code hierarchy. Implement richer number of evidence codes. Query communities about evidence codes.What would benefit them?

MA bequeathed the evidence code ontology to Su Rhee

ACTION ITEM Sue, Michelle, Rama put evidence code proposal in the context of what we discussed today

ACTION ITEM Evidence code committee. ‘Separate’ documentation for users and curators.

ACTION ITEM Evidence code Revise evidence code documentation so that a mutation in only one gene can only be IMP (protein locatization IGI example)

ACTION ITEM (Curators) Check whether you have used IGI in this way and update annotations

ACTION ITEM with column optional for NAS

ACTION ITEM only ND allowed for root nodes clarify documentation. Represents a status item

Summary of Action points from Day 2

  1. (David Midori Seth) Deploy the part that created SF items based on a friendly webform, and would like to see an OBO format in the SF item.
  2. Seth, ORB: Make link to how to make a perfect GO term from the term request tool NO LONGER NEEDED?
  3. Amelia link GOOSE from front page
  4. DH: Cross products: need to have webex meeting to everyone understands what to do.
  5. OBO file renaming. JB: add a link to Wiki: http://gocwiki.geneontology.org/index.php/Versionning_Proposal On the best practises page: http://gocwiki.geneontology.org/index.php/Best_Practises
  6. Midori etc to work on specification needed for new Amigo features.
  7. Gene Association files: to work on a more advanced interface to download custom files (Chris)
  8. Gene Association files: to filter files as they come in. (Chris)
  9. Judy: Predictive Activities. Collaborations with external groups. Reports into next GOC meeting as to these kinds of activities.
  10. Jim: Suggested Making a repository for predictions POSSIBLE ACTION ITEM?
  11. Finalizing proposed evidence code documentation – abbreviated version on web pages and more detailed on GOC Wiki (Rama)
  12. Eurie: querying communities on awareness of evidence codes – do you know what it is, what do you use it for? Also proposal of expanding, then get a feel for what would benefit them? So that we have a large audience.
  13. Sue, Michelle, Rama put evidence code proposal in the context of what we discussed today
  14. Evidence code committee. Documentation for users and curators.
  15. Evidence code Revise evidence code documentation so that a mutation in only one gene can only be IMP (protein localization IGI example)
  16. (Curators) Check whether you have used IGI in this way and update annotations
  17. (Curators) 'with' column optional for NAS - document
  18. Update evidence code decision tree in response to today's discussion on evidence code usage (Jen and EV Code WG)
  19. (Curators) only ND allowed to root nodes - clarify this in the documentation (Rama)