GO 18th Consortium Meeting Minutes Day 1

From GO Wiki
Jump to navigation Jump to search

Sunday morning, September 23, 2007 (Day 2 Minutes)

Introductions chronologically:

2007 Dimitri, Seth Carbon B-BOP

2006 Jim Hu ecoli, Susan Tweedie FlyBase, Trudi PAMGO, Donghue TAIR

2005 Ben Hitz SGD

2004 Doug Howe ZFIN, Ruth Lovering UCL, Victoria Petri RGD

2003 Jen Clark GO, Emily Dimmer GOA, Alex Deihl MDI, Mary Dolan MGI, Karen Eilbeck, Petra Fey, Ranjana Caltech, Pascale dDB, Kimberley Caltech

2002 John Day-Richter B-BOP, Eurie Hong SGD, Tanya, Amelia Ireland GO

2001 Rama SGD, Michelle TIGR, Harold Drabkin MGI

2000 Rolf Apweiler EBI, Val Wood Sanger, Rex Chisholm dDB, Chris Mungall B-BOP

1999 Midori Harris GO, Cara Dolinski PU, David Hill MGI

1998 Suzi Lewis B-BOP, Michael Ashburner, Mike Cherry SGD, Judith Blake MGI

Progress Reports

Next year the GO Consortium effort will celebrate its 10th birthday.

2007 Progress Report for NHGRI due Jan. 1, 2008

       These reports will review accomplishments to date. 
       We are using the itemized list of sub-aims from the grant to organize these 
Aim 1: We will maintain comprehensive, logically rigorous and biologically accurate ontologies.


Ontology development

Ontology Development 1 - Midori Harris

All content meeting related changes documented on Ontology Development Wiki http://gocwiki.geneontology.org/index.php/Ontology_Development

is_a complete was almost finished last meeting, but is now done and a system is in place to make sure it remains so.

Three high level terms need to be disjoint – cellular process, multicellular organism process and ??? – <<add wiki link>>.

Topics also overviewed

High priority revisions <<wiki>>

Content meetings have been held for:

  • Cardiovascular Physiology <<wiki>>
  • Muscle Development <<wiki>>
  • Transporter activities
  • Medium-scale content changes:
  • synaptic plasticity
  • RNA processing

Michael Ashburner: Question IMG/FIG to GO mapping.

Jen, Midori: the IMG to GO mapping is mostly finished. These items are waiting for Jane to return.

Chris M: mappings between the BP and MF terms still need to be done.

JB/SL: wiki is a valuable resource, however it can get muddled sometimes – managers should keep track.

Alex: if you add new large section you should send out a general email.

ACTION ITEM – tutorial on wiki discipline (assigned to Jim Hu ?).

Rex – in addition, there could be a group of wiki experts formed, who people could contact for advise.

Ontology Development 2 - David Hill

1) Taxon and sensu.

“Sensu” confused users, curators and editors became lazy in its implementation and accurate definitions were not created. Sensu terms have been renamed, merged or obsoleted (how many?) in collaboration with domain experts.

Definitions now need to state how a process occurs differently in the different organisms. If it is impossible to state this, then child terms will not be created. 60 left to differentiate, experts have been e-mailed. In future, term requests need to include reasons how a process occurs differently in different organisms. Synonyms containing the sensu information are kept for these terms.

Function-Process Links

Chris M: these mappings are complex Waiting for OBO-Edit 2.0 for help on cross-products.


2) Regulation.

New relationship – ‘regulates’ <<add wiki link>> `regulates will not be a part_of process terms, but will there will be a new relationship type ‘regulates’

“Regulation of “ terms were evaluated and fell into 3 categories

i) a process which regulates a molecular function (activity)

ii) regulation of a biological process

iii) regulation of a biological quality (membrane potential/ blood pressure), where blood pressure itself is not a process in GO.

Therefore some top regulation terms have been made.

Chris went through ontology and identified 'regulation of' terms that did not make sense. All analyses in CVS in /go/scratch directory regulation-of-non-process.txt and on the wiki they have been split into Three types of violations were identified:

http://gocwiki.geneontology.org/index.php/Regulation_Worksheet

1. the biological process does not exist as describing the regulation of a molecular function 2. regulation is more granular than the existing biological process 3. legimitate subtypes of GO terms, where the regulation is different.

Midori H: will also send information on this activity in progress reports This task has not been as bad as feared. There have been three categories of problems

Chris Mungall: there are problems with cross-products, and would be easier if the parent terms did exist. David H: this will be resolved once the parent terms do exist. David H: concern about consistency in regulates relationships.

CM: will look at relationships between cell types and GO terms: use as a guide to populate GO with missing terms.


??? Also something about negative and positive regulation being a special case, not sure where to put this When biological quality special case, use Homestasis? When process (what resolution)?


Q. VW: How existing annotations are affected by relationships change Eg transcription intiation. may have annotated more granularly to regulation of transcription initiation when there is direct involvement. Topic for annotation discussion at some point?


Part 1 Lists all the terms that were legitimate subtypes of 'regulation of molecular function' or regulation of biological quality'. (regulation of a process which did not exist)

Part 2 Lists all the terms that were legitimate subtypes of other GO terms, but where I thought we didn't need the processes themselves.

e.g.

  • Transcription involved in forebrain patterning?
  • Regulation of transcription involved in forebrain patterning
  • Part of forebrain patterning (check)
  • Part of regulation of transcription (check)
  • Transcription of forebrain patterning is not necessary

Part 3 Lists the terms where I have suggestions about how to handle them. Please check them. I have put question marks wherever I am not reasonably sure that my decision is correct. (includes typos, and things that were non univocal with their parent)

ACTION ITEM (ALL) Look at and comment on outstanding items (search on ?)

ACTION ITEM Check whether there should be a relationship between pigment metabolic process and pigmentation


3) Information content analysis. Collaboration with MIT/Harvard group.

This group were interested in measuring information contect of a GO term They looked at the number of annotations to a term related to its position in the ontology. They developed a statistical algorithm to determine information content based on the assumption that if not many genes are annotated to a term it has a high information content and a term with lots of gene products annotated has a low information content

Looked for outliers with respect to information content (either too specific, at a higher or lower level than they should be)

Took higher level terms which had too few annotations compared to other things the same distance from the root. and looked if they could be relocated. e.g pilus regulation was a direct child of 'cell physiological process' and was relocated to have more specific parents


Some terms were OK had nowhere to move it to and had correct parentage

Lots of specific terms had a larger than expected number of annotation eg. Olfactory receptor

Some reflect biological differences e.g. cation and anion transport


Q: JDR Is it possible to put this analysis into GOC tools?

A: CM – analysis already in database, can be used

Ontology Development 3 - Chris Mungall

Wiki for ontology structure (should be merged with Ontology Development)

http://gocwiki.geneontology.org/index.php/Ontology_Structure


1. Mining Reactome links to link process to function – more after lunch.


2. Internal cross products can start to be created and maintained in the ontology. OBO-Edit 2.0 will make it easier to maintain these cross products.

New cross product guide on wiki. Links to ongoing work on BP – CP cross products;

http://gocwiki.geneontology.org/index.php/Cell_cross-products

Includes:

Internal links (existing)

External links (function to process links)

External links (x products)


3. contributes_to discussion postponed


4. OBO reasoner Ontology repair tool for links that don’t exist or are broken (i.e. missing is_a links). Need consistent rules for regulation terms with part_of is_a relationships.

Karen Eilbeck SO progress

Development : March–>August joined J Thornton group - Gabi Reeves for BioSapiens project. 96 new terms added to SO.

Mark Hathon (with Barry Smith) – ongoing work on regulatory regions.

Content meeting in June, HLA immunology community – looking for terms to describe variants. New terms, rearranging of SO – very productive.

Collaboration with phyGo. Mobile genetic elements?

Working on synonyms with Colin Batchelor.

Release SO every 2 months.

Karen dropping down to 60%.


COFFEE BREAK


Aim2: We will comprehensively annotate reference genomes in as complete detail as possible.


Reference Genome Annotation Project - Rex Chisholm

Aim3: We will support annotation across all organisms.


File:ReferenceGenomes GOC PU 2007final.ppt

Provide comprehensive, robust collection of annotations for 12 genomes.

Complete means breadth and depth.

Breadth – every gene.

Depth – to the highest possible knowledge. If small amount of papers then read all. If extensive then summary of reviews. (Completion best assessed by a curator)

Target Gene Identification (Priority genes)

250 genes identified for curation. Gene when mutated should contribute to a disease (OMIM).

Ortholog Identification

Per MOD – curators responsible for identifying orthologs using the commonly available tools.

Software

Currently use Google spreadsheets <<add URL>> – Not robust, time consuming. Anxious to work with the SW group to develop a database – requirements have been written up. Merchant (left in July) wrote prototype << ADD URL>>. A new member of staff is starting at the end of September to continue development.


Metrics

Annotation Progress – see slide. No of papers per gene etc

Number of sourceforge requests from reference genome group in the hundreds over 16 months. Average of 10-12 requests per month. GO editorial group doing a good job at keeping up with these. Existing requests are problematic. 411 terms.

Ruth Lovering's Metrics Document v3: File:HowToCaptureMetrics3.doc


Tools and Analysis, Display approaches

Comparing annotations to generic GOSlim branches.

Annotation completion

Mike slide

Measuring info content (Chris)

Annotation consistency (Marys graphs add link??)

Table View (slim showing each terms annotated for a gene) includes every term useful for curation and annotation consistency (add link???).

Annotation Outreach – Jen Deegan

Keeping track of new groups annotating and writing documentation.

ASK JEN FOR HER SLIDES.

Jen described the scope and techniques of outreach effort. Showed an 'ontology ' of outreach effort. There has been much progress on grants.

Attending many regular conferences.

Less cold calling, it wasn’t very successful. More luck tracking down the right person at conferences. Responding to invitations.


People going to meeting – report back gossip from willing people to Jen.

The SOPs have been tricky but are now on the public GOC website:

http://www.geneontology.org/GO.annotation.SOP.shtml

Michelle created nice ISS guideline SOP.

ACTION ITEM Jen: A reference to these pages should go in next newsletter.

ACTION ITEM Jen Add a link from outreach to something (SOP?)

ACTION ITEM investigate why terms requests aren’t coming in, do we need things we need to do to make it easier, SF tracker list/annotation list/ who are on these lists/ do other people need to be on those lists?

User Advocacy - Eurie

Focusing on lines of communication, web presence, newsletter and mailing list.

Different users, new iusers, current users, power users

Rota of mailing list monitors.

Newsletters archived. Future news items page on wiki. Wiki or Newsletter ideas <<add link>>

Michael A wants ISSN for the newsletter.

ACTION ITEM NML Michael sent URL to Eurie – Action Eurie!

Somebody mentioned RSS feed, is this a potential action?

Users meetings, we have a page of potential meetings on wiki.

Tools standards. (Needs to be cleaned up and categorised)

Production Systems - Ben Hitz

File:ProductionReport GOC PU 2007.ppt

Deployed 4 new linux machines 1 for loading, 2 for Amigo production, 1 Amigo development.

Production Amigo now more fault resistant.

Go Database loading speeded up and now in testing.

Godb sequences – using gp2protein files. If possible do all sequences in your DB, not just annotated.

Assocdb fasta file – Header line massive – can be slimmed down?

Association file cleaning – All IEAs must have a with field.

Q Who does gp2protein go to?

Amigo – Amelia

Amigo enhancements/ new search features demo

Term enrichment.

Go slimmer.

<<add demo URLS???>>

LUNCH BREAK

Action Items Review

This large section moved to it's own page:

Outstanding Action Items from 17th GOC Meeting, Cambridge UK

Afternoon, Sunday, Sept 23, 2007

Reactome - Peter D’Eustachio

File:Reactome to GO GOC PU 2007.ppt

Reactome can provide data to proteins that UniProt does not yet have manual annotations for most of this Reactome data is derived from experimental evidence identified from papers however unlike the GO annotation method, the types of experiments have not been recorded.

Emily: GOA would love this data, but unless have a new parent ‘Experimental’ code, the best that exists is ‘TAS’.

Suzi Lewis: there is a use for a hierarchy of evidence codes. With an ‘E’ Experimental code as a parent of the IMP, IGI, IDA, IPI, IEP granular codes.

Peter: Homolog sets used to transfer data between species is determined by individual experts, and transfer between orthologs AND homologs (where functionally similar)

Judy and Suzi: Reactome data is valuable. It is unacceptable to not be including it in GO and it is unacceptable that this data should have anything less than an experimental evidence code. TAS or NAS evidenced data are unacceptable also.

Peter: current Reactome curation methods is to avoid unpublished data and Reactome curators also want to be opinionated about the published data, to the end that Reactome will reflect current expert opinion, and avoiding hypothetical theories. Only confirmed, accepted knowledge is included. There are 10 curators, only 2 of whom have previous experience in GO annotation, there is no budget to do GO annotation and no desire to teach curators about GO evidence codes. Don’t always know which piece of literature applies to which info. 2000 gene annotated. 4000 pieces of literature. It is not clear how many GO annotations this would convert to.

Suzi Lewis: This brings up the question of what is the purpose of evidence codes? Why do we have the ones we have? Do users use them? (something to discuss tomorrow).

Pascale: have evidence from users that they do care whether IDA or IMP codes are used.

Peter: There is not always a 1 GO term to 1 publication relationship. Sometimes a GO term may have originated from the combined curation of many papers.

Eurie and John Day-Richter: TAS annotations are valuable, and may be good to get the data in.

Suzi, Judy: this data is too good for TAS.

Emily D: Why not use a mix of codes depending on the GO term to publication ratio? For those instances where there is a 1:1 relationship of GO term to publication: use ‘E’, for 1 GO term to many publications: use ‘TAS’ and cite the Reactome reaction web page as the source – this then acts as the reviewed document.

David Hill: concerned about the proposition of a new ‘Experimental’ evidence code: might loose analytic power.

Judy B: could Reactome curators go back and re-annotate those 4,000 papers and convert the codes to one of the GO experimental codes? This would only take 2 weeks to do.

Peter: Not possible – Reactome have defined goals, we cannot afford to reannotate for GO. 75 genes/month is the absolute minimum annotations. We have our own grant objectives we must fulfill.

David Hill: GO curators could prioritize the reannotation of genes for which there is not much annotation available.

Rex: could the reference genome groups each take on a subset of annotations and re-annotate?

Emily: then the annotation would belong to the group that reannotated. We would be using Reactome data as a source, but the final annotations would be attributed to the group that provided the final annotations. Might not be the best use of resources.

Suzi : Would accept ‘EXP’ for the 1:1 mapping of GO term to publication.


Q Val: Any idea how many aren’t covered by GO annotation already? A. No…


Judy, Sue R, Emily D, Tanya B: the ‘EXP’ code would make life easier for users, for other integrations as well

ACTION: Reactome annotations should be available from GO by the next GO Consortium meeting. Chris, Alex, Jen and Ruth to be responsible. Add new evidence code EXP for 1:1 Reactome to literature, add all other Reactome with TAS to Reactome source.

Arguments for structuring evidence codes i) make things simpler ii) allow incorporation of other date iii) needn't change our current usage iv) do the TAS for the things that don’t fall under EE that can’t be assigned to a single paper.

(continue tomorrow the discussions of the point of evidence codes and the possibility of new parent ‘EXP’ code)


Protein Complexes: GO vs/ Reactome

Reactome complexes are seen as an entity, (i.e. a collection ofo proteins) whereas GO treats complexes as a subcellular location However there is also a blurring between the two for Reactome, especially when looking at large complexes. Peter: In our annotations, a cross-reference slot allows us to cite a GO identifier for the location (usually to the parent term of the complex). Reactome curators add the cc term that is most granular, and willing to generate SourceForge request for those missing

Judy B: talked to Lisa in Bar Harbor on complexes for Reactome. Concern about the active function tag to the active polypeptide.

Peter: for a catalysis – any physical entity in a complex is given a GO term describing the activity, however the active unit, which mediates the reaction is labeled by Reactome. Can parse out which of the polypeptides had the catalysis functions and which are just associated – in most cases this is identified by experimental data. Although Reactome does not always search for the most granular Biological Process GO term, these haven’t been applied consistently.

David Hill: there should be no problem mapping this data from Reactome, while the concepts in GO and Reactome are not equivalents this is not a problem as GO would annotate the same gene products as Reactome would.

Peter: Ewan did have a concern about the ‘contributes_to’ qualifier – concerned that a significant number of end users would not always be aware of use contributes_to. But really this is the users problem. And they can strip out if necessary.

Jennifer: users have suggested that GO could strip out annotations which use the contributes_to column (especially the NOT annotations) and these then could be provided as a separate file. As these can be dangerous to ignore.


ACTION ITEM convert Reactome complex terms to GO terms

‘Taxon and GO’ - Jen Deegan

File:Taxon and GO GOC PU 2007.ppt (using paper from Waclaw Kusnierczyk)

Originally Chris and Jen worked to loose sensu tags and redefining definitions and adding taxon links - However removal of taxon has been a problem. There are now 23,802 terms. Searching for terms is a time sink for users, - GO help has often received queries from users asking if there is a taxon-specific GO slim/subset of terms (e.g. plant-specific GO)

- In addition, Jen as outreach officer has found new MOD groups are unwilling to annotate to GO unless there is a slim available for them.

- GO language can be subtle. GO term names can now be complex now the sensu information has been removed. This would make GO terms easier to find and decipher.

- In addition, having taxon information in the GO helps error checking

- There are 3 types of relationships that could be applied to relate taxon to GO terms: 1. Is_relevant_to ` 2. is_only_in 2. applies_to_all

This taxon-specific information would be added into a separate file.

Discussion:

Judy: Against including taxon information within the GO as we do not know all properties of a taxon. Taxonomic information is in flux also, we do not want a dependence on taxonomy in GO. We would be restricting ourselves if we did not make all terms available to all users. Could not instead users look at the terms that were used by a reference genome group to see what terms are appropriate for a particular taxon?

- general disagreement from curators of this possibility.

Agreement that there are incorrect annotations which relate to taxon-specific properties: Harold: in the Phantom load – needed to remove incorrect mouse annotations

Val, Harold: InterPro2GO throw out problems. These could be identified by this method.

Val: I perform monthly checks to ensure no inappropriate terms have come in at high level. This is time consuming, and this would help.

Pascale: would help sanity check annotation data

Val: this species information doesn’t need to be comprehensive to be useful for annotation checks

Eurie: if this would help annotators, this information could be built into an annotation tool?

Ruth: there are interesting concepts here, but does it need to be so complicated, would all taxons need to be included. Could we not instead just use just 10 high-level taxon identifiers.

Judy: Instead, could not rulebase triggers be used Efforts should be on annotation of literature rather than waste a considerable amount of time incorporating taxon information. We do not want to commit such a level resources to such a project especially as budgets are stretched presently. Again, concern about fluidity of taxon-specific information

Sue Rhee: we should explore usage of GO slims.

Suzi Lewis: there are risks in this kind of project, and concerned that this project would entail quite a bit of work and could also be misunderstanding by users. Can we have a low-key evaluation.

JDR: a large-scale activity of this – is a bad idea. You would propagate garbage by accepting all annotations. Could use as just a framework by only using 10 top taxon id. – this would already help find problems. JC – agreed.

Alex D: Isn’t this just a user education problem? Users need to take the time to understand the GO hierarchy, that you can search synonyms, definitions etc. Feel that user queries are symptoms of users not trying hard enough to work with GO.

Mike Cherry: could not afford to make this a big project, there are other developments in GO which need to be addressed

Rex: Had concern about making taxon-specific assertions that are flawed. If these types of sanity checks or limits were automatically applied, we would loose the potential value of not looking into these, however this data would probably tell us something fundamental about biology, and loose the ability to investigate these.

Judy: classifications of taxon are based on phenotypes and not molecular data and many things are being found and taxons are being redefined. Prefers’ is_relevant_to’ Like the idea of flags/triggers to factiliate work, but wouldn’t automatically exlude, as this data is important.

Michael A: while some taxonomy is changing e.g. in protista, it is unlikely that viridplantea or mammalian will move around so much.

Ben Hitz: what fraction of problems would be solved if there were cross-products to taxonomy were included?

Jen Deegan: it would solve some, it would help with the development terms.

Ben Hitz: what would the time line be for taxon cross-products?

Chris Mungall: this is much further down the line.

Judy. Our main issue here is how to facilitate annotations in our groups. However but we are hung up on a suggestion from outside the group.

Chris Mungall: slims are much harder to maintain than these relationships would be.

Michelle M: When the prokaryotic subset was created, she was v much against. Instead of users looking at 20,000 terms, they are now looking at 9,000 – there is not that much benefit. Don’t think new users need this, need to facilitate better ways of finding terms within the tool. For curators it might be useful for error checking, but not new users.

JDR: although there is a big concern that you’d loose annotations because of these relationships, this would not be the case as the incorrect annotations would instead be brought to your attention – and visible to better investigate/ or improve GO. the rules could be fixed.


Ruth: how would this data be viewed ? In addition, if a user does not understand a term then it really is a problem with the terms definition – instead the definition needs to be improved, this would be far more valuable than adding in an additional cross-link.

Jen: will be willing to carry out this task in her own time. Would use annotation data from the association files, take the associated taxon ids and condense to a top, high-level 10 taxon identifiers. Confident that this would greatly help her ontology development effort especially for development terms.


ACTION: Jen to do a pilot project with a minimal set of terms, as an experiment and bring back results for next GO meeting

ACTION: (David Hill) Make difficult sensu terms organism specific (biologist intuitive) (i.e plant vacuole, fungal vacuole). However GO definitions will still be designed to be formal, not depending on species to define the term.

Summary of action Items from Day 1

  1. Tutorial on wiki discipline (assigned to Jim Hu ?).
  2. (ALL) Look at and comment on outstanding items Outstanding Action Items from 17th GOC Meeting, Cambridge UK
  3. Check whether there should be a relationship between pigment metabolic process and pigmentation
  4. Jen: A reference to these pages should go in next newsletter.
  5. Jen Add a link from outreach to something (SOP?)
  6. investigate why terms requests aren’t coming in, do we need things we need to do to make it easier, SF tracker list/annotation list/ who are on these lists/ do other people need to be on those lists?
  7. NML Michael sent ISSN URL to Eurie – Action Eurie!
  8. Somebody mentioned RSS feed, is this a potential action?
  9. Reactome annotations should be available from GO by the next GO Consortium meeting. Chris, Alex, Jen and Ruth to be responsible. # Add new evidence code EXP for 1:1 Reactome to literature, add all other Reactome with TAS to Reactome source.
  10. Convert Reactome complex terms to GO terms
  11. Jen to do a pilot project with a minimal set of terms, as an experiment and bring back results for next GO meeting
  12. (David Hill) Make difficult sensu terms organism specific (biologist intuitive) (i.e plant vacuole, fungal vacuole). However GO definitions will still be designed to be formal, not depending on species to define the term.