GO 18th Consortium Meeting Minutes Day 1

From GO Wiki
Jump to navigation Jump to search

Sunday morning, September 23, 2007

Introductions chronologically:

2007 Dimitri, Seth Carbon B-BOP

2006 Jim Hu ecoli, Susan Tweedie FlyBase, Trudi PAMGO, Donghue TAIR

2005 Ben Hitz SGD

2004 Doug Howe ZFIN, Ruth Lovering UCL, Victoria Petri RGD

2003 Jen Clark GO, Emily Dimmer GOA, Alex Deihl MDI, Mary Dolan MGI, Karen Eilbeck, Petra Fey, Ranjana Caltech, Pascale dDB, Kimberley Caltech

2002 John Day-Richter B-BOP, Eurie Hong SGD, Tanya, Amelia Ireland GO

2001 Rama SGD, Michelle TIGR, Harold Drabkin MGI

2000 Rolf Apweiler EBI, Rex Chisholm dDB, Chris Mungall B-BOP

1999 Midori Harris GO, Cara Dolinski PU, David Hill MGI

1998 Suzi Lewis B-BOP, Michael Ashburner, Mike Cherry SGD, Judith Blake MGI


Progress Reports

2007 Progress Report for NHGRI due Jan. 1, 2008



       These reports will review accomplishments to date. 
       We are using the itemized list of sub-aims from the grant to organize these 
Aim1: We will maintain comprehensive, logically rigorous and biologically accurate ontologies.


2.1.2.1 Ontology development (Midori, David)

Midori:

All content changes and ??? Documented on Ontology Development Wiki http://gocwiki.geneontology.org/index.php/Ontology_Development

is_a complete was almost finished last meeting, but is now done and a system is in place to make sure it remains so.

Three high level terms need to be disjoint – cellular process, multicellular organism process and ??? – <<add wiki link>>.

Topics also overviewed

High priority revisions <<wiki>>

Content meetings and their documented changes <<wiki>>

Question IMG/FIG mapping. How much hanging now Jane gone on maternity? Not fixed!

JB/SL wiki can get muddled sometimes – managers should keep track

Alex – if you add new large section you should send out a general email.

ACTION ITEM – tutorial on wiki discipline (assigned to Who?).

Rex – wiki help contact group.

David:

i) Taxon and sensu: “Sensu” confused users, curators and editors became lazy in its implementation and use. Sensu terms have been renamed, merged or obsoleted (how many?) in collaboration with domain experts. 60 left to differentiate.


ii) Regulation. New relationship – ‘regulates’ <<add wiki link>> `regulates will not be a part_of process terms, but will have a new relationship type ‘regulates’

“Regulation of “ terms were evaluated and fell into 3 categories

i) regulates a molecular function (activity)

ii) regulation of a biological process

iii) regulation of a quality (membrane potential/ blood pressure) (some will be is_a some part_of, didn’t catch this)


Three types of violations were identified:

i) regulation of a process which did not exist. ii) regulation terms that are more specific than existing parents (Possibly missing processes, but may be not necessary to add) eg. Transcription involved in forebrain patterning? Regulation of transcription involved in forebrain patterning Part of forebrain patterning (check) Part of regulation of transcription (check) Transcription of forebrain patterning is not necessary


Chris went through ontology and identified 'regulation of' terms that did not make sense. All analyses in CVS in /go/scratch directory regulation-of-non-process.txt and on the wiki they have been split into Three types of violations were identified:

http://gocwiki.geneontology.org/index.php/Regulation_Worksheet


Part 2:

How are we going to relate these terms to the rest of the ontology?

Action Item: need to trash through regulation.

Part 3 are problem cases. Including typos! Search on this page for ‘?’ for proposals by David where he needs resolution. Going through this list created links between function and process ☺

Negative and positive regulation question – were subtypes. Blood pressure it made no sense. Thought better to be part_of relationships. Keep it as subtypes for BP.

Collaboration with MIT/Harvard group.

Terms in wrong place? Shown through with their content analysis tool. Information Content Analysis. Can we place these terms in a better place – most of the time the answer was, yes. For example, pilus retraction moved further down.

Possible to put this analysis into GOC tools? CM – analysis already in database – check with Chris what he means.

Ontology Structure – CJM

Wiki should be merged with Ontology Development

http://gocwiki.geneontology.org/index.php/Ontology_Structure

Mining Reactome links to link process to function – more after lunch.

Internal cross products can start to be created and maintained in the ontology. OBO-Edit 2.0 will make it easier to maintain these cross products. Features ontology repair tool for links that don’t exist or are broken. Need consistent rules for regulation terms where the regulated stand in part_of relations.

New cross product guide on wiki. Links to ongoing work on BP – CP cross products;

http://gocwiki.geneontology.org/index.php/Cell_cross-products


Karen Eilbeck SO progress

Development : March – august joined J Thornton group - Gabi reeves for Biosapiens project. 96 new terms to SO.

Mark Hathon (with Barry)– ongoing work.

Content meeting in June, HLA immunology community – looking for terms to describe variants. New terms, rearranging of SO – very productive.

Collaboration with phyGo.

Working on synonyms with Colin Batchelor.

Release SO every 2 months.

Karen dropping down to 60%.


COFFEE BREAK


Aim2: We will comprehensively annotate reference genomes in as complete detail as possible.

Reference Genome Project

Rex/Pascale

Provide comprehensive annotation for 12 genomes. MOD, genome DB, curators. Complete means breadth and depth. Breadth – every gene. Depth – to the highest possible knowledge. If small amount of papers then read all. If extensive then summary of reviews. Metrics. GET REX’s PRESENTATION.

250 genes identified for curation. Gene when mutated should contribute to a disease (OMIM).

Per MOD – curators responsible for identifying orthologs using the commonly available tools.

S/W – Google spreadsheets – erratic. Not robust. Anxious to work with the SW group to develop a database – requirements have been written up. Merchant (left in July) wrote prototype. A new member of staff is starting at the end of September to continue development.

Annotation Progress – see slide.

Display approaches – comparing annotations to generic GOSlim branches.

Number of sourceforge requests from reference genome group in the hundreds over 16 months. Average of 10-12 requests per month. GO editorial group doing a good job at keeping up with these. Existing requests are problematic. 411 terms.


Aim3: We will support annotation across all organisms.

Annotation Outreach – Jen Deegan

Keeping track of new groups annotating and writing documentation.

ASK JEN FOR HER SLIDES.

People going to meeting – report back gossip from willing people to Jen.

The SOPs have been tricky but are now on the public GOC website:

http://www.geneontology.org/GO.annotation.SOP.shtml

Michelle created nice ISS guideline SOP.

Action Jen: A reference to these pages should go in next newsletter.

There has been much progress on grants.

Attending many regular conferences.

Less cold calling, it wasn’t very successful. More luck tracking down the right person at conferences. Responding to invitations.


Eurie - User Advocacy

Focusing on lines of communication, newsletter and mailing list. Rota of mailing list monitors.

Newsletters archived. Future news items page on wiki. Michael A wants ISSN for the newsletter. Action Michael! Michael sent URL to Eurie – Action Eurie!

Users meetings, we have a page of potential meetings on wiki.

Tools standards.

Production Systems – Ben Hitz

Deployed 4 nesw linux machines 1 for loading, 2 for amigo poduction, 1 amigo development.

ASK BEN FOR SLIDES

Production Amigo now more fault resistant.

Go Database loading speeded up and now in testing.

Godb sequences – using gp2protein files. If possible do all sequences in your DB, not just annotated.

Assocdb fasta file – Header line massive – can be slimmed down?

Association file cleaning – All IEAs must have a with field.

Amigo – Amelia ASK AMELIA FOR SLIDES

Term enrichment.

Go slimmer.


LUNCH BREAK

Action Items Review

Items In progress or marked as done are not re-listed here.


10. NOT DONE. Action Item for John: Add a term creation date to the .obo file. It didn't get added to the feature request tracker so it was overlooked. Now added:

-> JDR added to tracker on plane on the way here. NOW DONE


13. NOT DONE: Action item for AmiGO Working Group (from 4): The AmiGO Working Group will implement a strategy to incorporate and display the contents of the GO references. Request is on the AmiGO tracker (thanks Jane) http://sourceforge.net/tracker/index.php?func=detail&aid=1667315&group_id=36855&atid=494390

-> Done in next release. Amelia. On the tracker still.

15. Action Item for GOA (from 1): Talk to Reactome about getting non-TAS evidence. TAS is no longer considered a useful evidence code and will not be used in any consistency measures of reference genome annotation. Since part of the idea of the reference genomes is to provide a source of IEA annotations for other groups, we strongly encourage reference genome annotators to not use TAS, and instead use experimental evidence codes whenever possible. The GO documentation should also state this in a clear fashion. [annotation camp participants]

-> IN PROGESS NEXT ON AGENDA

16. Action Item for Judy, Harold, Amelia, Eurie, John Day-Richter (from 3,4,5): Resolve communication issues around obsoletes.

-> Midori to look into but probably done – tags applied to obsolete terms in OBO edit file.

17. NOT DONE (more discussion required) Action Item for Chris and Jane (Revisit 7): Remove the word 'activity' from the molecular function terms, and consider renaming the molecular function ontology. [note that later on in the meeting after action item 73 notes state that more discussion on this proposal is needed]

-> Strike activity from the function terms? Some people very opposed. DH.

18. [In Progress] Action Item for Jen & Chris (from 8): Assign priorities for contents changes needed to implement sensu plan. Discuss the different aspects of changes to our use of sensu, write documentation on this, and implement the new strategy. This change will then be announced to the community. Comment: There is more discussion to be had on this and a slot has been assigned for this in the meeting.

-> Jen to update on sensu later in meeting. See action items under Taxon/Sensu.


21. Action Item for Eurie (Revisit 12): No conclusion about how to distinguish large- vs small-scale experiments was reached. People are encouraged to keep thinking about this issue, which clearly needs more discussion.

-> To be discussed at this meeting.

28. Action item (Suzi, Tanya): Continue developing [protein family-based annotation] tool.

-> Put aside for the moment. Need to revisit. EVOLVING

29. Action item (Midori and Rex): Do it [i.e., new column in annotation files], add the column and document the formalism. It's not at all clear why this has my name (and I don't remember); it seems more for the software group and annotators. [mah]

-> TO BE DISCUSSED. Column 16. Structured Notes Field AKA SLOTS ☺

31. Action Item (Jane): Keep researching CARO integration.

-> Jane.

32. Action Item (AWG): Explore whether this [Mary's graphs] would be a useful addition to AmiGO 2.0.

-> STILL EXPLORING. Use logs? More discussion.

33. Action Item (AWG): There should be more tracking of what users are doing in the next AmiGO; explore the options.

-> Adding dates to file done but nothing further happened.


35. Action Item (John & Jane): Come up with a proposal for a history tracking tool, including timeframe and priority, and send out email.

-> ???

36. Action item (David, Jane): Organize first training meeting.(postponed due to the amount happening at this meeting)


-> ???

37. Action Item (Nicole, Jane, John, Mark): Create proposal to group for term submission software, including the “minimum standards for each term”.

-> JOHN not done – proposal considered.

38. NOT DONE! Action Item (AmiGO Working Group): Change AmiGO to hide (by default) structured comments of certain types. Obsolete comments will not be shown. Structured comments that don’t belong will not be shown. Have option to hide comments. [different types of comment not specified so no action]

-> Need more specification/requirements. Amelia.

39. Action Item (Karen E.): Send the information [microarray tool evaluation] to the group

-> Probably obsolete. Therefore Done?

Items 50-52 were all about the WITH field for the IPI code.

51. Action Item (Chris, Ben): Work through any database implications.

-> ???

52. Action Item (User Advocacy): Announce in newsletter, “what's new” section of GO site.

-> ???


60. Action Item (Evidence Code Working Group): Develop a decision tree for choosing an evidence code. Present results at next consortium meeting. (large scale/small scale, reviewed/un-reviewed. Similarity/not-similarity etc).

-> TO BE DISCUSSED AT THIS MEETING

61. Action Item (Rex): Assign programmer to check integrity of WITH field for annotations.

-> NEED NEW PROGRAMMER

63. Action Item (Evidence Code Working Group): Send RCA code back to committee.

-> TO BE DISCUSSED AT THIS MEETING



78. Action Item (User Advocacy): Draft an official GO brochure.

-> TODO



Afternoon, Sunday, Sept 23, 2007 Discuss protein complexes and the intersection with Reactome annotations

Peter D'Eustachio (Reactome) will be able to join us only for the afternoon.

GET SLIDES FROM PETER

These are some other topics that Ewan Birney brought up at the Interactome meeting:

   * Reactome has a lot of function annotations with the connections to the complex that has the function. These are for very small complexes like homodimers. He says can we accept these? Also he says that he understands that this is a bit granular for us and that we have a lot of similar information for larger complexes. He suggests that there might be better way to store all the data for the functions of all sizes of complexes if we pooled out data and thought of a good system to make it available. 
   * Ewan Birney and the people at the Interactome meeting felt strongly that the more complicated annotations such as those with a NOT or colocalizes_with qualifier, and those with annotations to the root should be in a separate file from the straightforward annotations. The users felt that many people do not know about these more complicated annotations and it would be better to make them an added extra that people must specifically go to download. They were particularly concerned that users may not know that it is important to parse the qualifier column and so may miss vital information. 

• Reactome would like to have a new evidence code to show where an IEA has been transferred from a known orthologue. Such transfers tend to produce more granular annotations. (This has been discussed before but Ewan asked me to bring it up while Peter is there to back the proposal.)


Action: Get Reactome data into GOC before next meeting.


Jen Deegan – ‘Taxon and GO’ (presentation, using paper from Waclaw Kusnierczyk)

Originally Chris and Jen worked to loose sensu tags and redefining definitions and adding taxon links - However removal of taxon has been a problem. There are now 23,802 terms. Searching for terms is a time sink for users, - GO help has often received queries from users asking if there is a taxon-specific GO slim/subset of terms (e.g. plant-specific GO)

- In addition, Jen as outreach officer has found new MOD groups are unwilling to annotate to GO unless there is a slim available for them.

- GO language can be subtle. GO term names can now be complex now the sensu information has been removed. This would make GO terms easier to find and decipher.

- In addition, having taxon information in the GO helps error checking

- There are 3 types of relationships that could be applied to relate taxon to GO terms: 1. Is_relevant_to ` 2. is_only_in 2. applies_to_all

This taxon-specific information would be added into a separate file.

Discussion:

Judy: Against including taxon information within the GO as we do not know all properties of a taxon. Taxonomic information is in flux also, we do not want a dependence on taxonomy in GO. We would be restricting ourselves if we did not make all terms available to all users. Could not instead users look at the terms that were used by a reference genome group to see what terms are appropriate for a particular taxon?

- general disagreement from curators of this possibility.

Agreement that there are incorrect annotations which relate to taxon-specific properties: Harold: in the Phantom load – needed to remove incorrect mouse annotations Val, Harold: InterPro2GO throw out problems for mouse Pascale: would help sanity check annotation data

Val: this species information doesn’t need to be comprehensive to be useful for annotation checks

Eurie: if this would help annotators, this information could be built into an annotation tool?

Ruth: there are interesting concepts here, but does it need to be so complicated, would all taxons need to be included. Could we not instead just use just 10 high-level taxon identifiers.

Judy: Instead, could not rulebase triggers be used Efforts should be on annotation of literature rather than waste a considerable amount of time incorporating taxon information. We do not want to commit such a level resources to such a project especially as budgets are stretched presently. Again, concern about fluidity of taxon-specific information

Sue Rhee: we should explore usage of GO slims.

Suzi Lewis: there are risks in this kind of project, and concerned that this project would entail quite a bit of work and could also be misunderstanding by users. Can we have a low-key evaluation.

JDR: a large-scale activity of this – is a bad idea. You would propagate garbage by accepting all annotations. Could use as just a framework by only using 10 top taxon id. – this would already help find problems. JC – agreed.

Alex D: Isn’t this just a user education problem? Users need to take the time to understand the GO hierarchy, that you can search synonyms, definitions etc. Feel that user queries are symptoms of users not trying hard enough to work with GO.

Mike Cherry: could not afford to make this a big project, there are other developments in GO which need to be addressed

Rex: Had concern about making taxon-specific assertions that are flawed. If these types of sanity checks or limits were automatically applied, we would loose the potential value of not looking into these, however this data would probably tell us something fundamental about biology, and loose the ability to investigate these.

Judy: classifications of taxon are based on phenotypes and not molecular data and many things are being found and taxons are being redefined. Prefers’ is_relevant_to’ Like the idea of flags/triggers to factiliate work, but wouldn’t automatically exlude, as this data is important.

Michael A: while some taxonomy is changing e.g. in protista, it is unlikely that viridplantea or mammalian will move around so much.

Ben Hitz: what fraction of problems would be solved if there were cross-products to taxonomy were included?

Jen Deegan: it would solve some, it would help with the development terms.

Ben Hitz: what would the time line be for taxon cross-products?

Chris Mungall: this is much further down the line.

Judy. Our main issue here is how to facilitate annotations in our groups. However but we are hung up on a suggestion from outside the group.

Chris Mungall: slims are much harder to maintain than these relationships would be.

Michelle M: When the prokaryotic subset was created, she was v much against. Instead of users looking at 20,000 terms, they are now looking at 9,000 – there is not that much benefit. Don’t think new users need this, need to facilitate better ways of finding terms within the tool. For curators it might be useful for error checking, but not new users.

JDR: although there is a big concern that you’d loose annotations because of these relationships, this would not be the case as the annotations would instead be highlighted – and visible to better investigate/ or improve GO

Ruth: how would this data be viewed ? In addition, if a user does not understand a term then it really is a problem with the terms definition – instead the definition needs to be improved, this would be far more valuable than adding in an additional cross-link.

Jen: will be willing to carry out this task in her own time. Would use annotation data from the association files, take the associated taxon ids and condense to a top, high-level 10 taxon identifiers. Confident that this would greatly help her ontology development effort especially for development terms.

ACTION: Jen to do this as an experiment and bring back results for next GO meeting

ACTION: (David Hill) Organism references back into GO term names, however GO definitions will still be designed to be ful and accurate, not depending on species to define the term.