GO 18th Consortium Meeting Minutes Day 1: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
mNo edit summary
 
(114 intermediate revisions by 9 users not shown)
Line 1: Line 1:
'''Please note that these meeting minutes are now being edited for production of a final version (02-11-2007).
Therefore please do not add any more information here but contact Val or Emily.'''
----
Sunday morning, September 23, 2007
([[GO 18th Consortium Meeting Minutes Day 2|Day 2 Minutes]])
== Introductions chronologically: ==
== Introductions chronologically: ==


2007 dimitri seth carbon bop
'''2007''' Dmitry Sitnikov MGI, Seth Carbon BBOP
2006 jim hu ecoli, susan tweedie,  trudi pamgo Donghue tair
2005 ben sgd,
2004 doug zfin, ruth ucl, victori rgd
2003 jen Emily, alex deihl mgi, mary dolan mgi, Karen eilbeck, petra fey, ranjana Caltech, pascale ddb, Kimberley,
2002 jdr bop, eurie sgd, Tanya Amelia
2001 rama sgd, mgwin tigr, Harold mgi,
2000 rolf ebi, rex ddb, cm bop,
1999 midori, kara dolinski PU, david hill mgi,
1998 suzi lewis bop, MA, mike cherry, judy


'''2006''' Jim Hu E. coli, Susan Tweedie FlyBase, Trudi Torto-Alalibo PAMGO, Donghui Li TAIR


GO Consortium Meeting Minutes
'''2005''' Ben Hitz SGD
'''2004''' Doug Howe ZFIN, Ruth Lovering UCL


Sunday morning, September 23, 2007
'''2003''' Jen Deegan GO, Emily Dimmer GOA, Alexander Diehl MGI, Mary Dolan MGI, Karen Eilbeck SO, Petra Fey dictyBase, Ranjana Kishore Caltech, Pascale Gaudet dictyBase, Victoria Petri RGD, Kimberley Van Auken Caltech
'''2002''' John Day-Richter BBOP, Eurie Hong SGD, Tanya Berardini TAIR, Amelia Ireland GO
'''2001''' Rama Balakrishnan SGD, Michelle Gwinn Giglio TIGR, Harold Drabkin MGI
 
'''2000''' Rolf Apweiler EBI, Val Wood Sanger, Rex Chisholm dDB, Chris Mungall BBOP
 
'''1999''' Midori Harris GO, Kara Dolinski PU, David Hill MGI
'''1998''' Suzi Lewis BBOP, Michael Ashburner, Mike Cherry SGD, Judith Blake MGI
 
== Progress Reports ==
 
Next year the GO Consortium effort will celebrate its 10th birthday.
 
2007 Progress Report for NHGRI due Jan. 1, 2008


  1. Introductions, especially new people
        These reports will review accomplishments to date.  
        We are using the itemized list of sub-aims from the grant to organize these


Introductions chronologically:
Aim 1: We will maintain comprehensive, logically rigorous and biologically accurate ontologies.


2007 dimitri seth carbon bop
2006 jim hu ecoli, susan tweedie,  trudi pamgo Donghue tair
2005 ben sgd,
2004 doug zfin, ruth ucl, victori rgd
2003 jen Emily, alex deihl mgi, mary dolan mgi, Karen eilbeck, petra fey, ranjana Caltech, pascale ddb, Kimberley,
2002 jdr bop, eurie sgd, Tanya Amelia
2001 rama sgd, mgwin tigr, Harold mgi,
2000 rolf ebi, rex ddb, cm bop,
1999 midori, kara dolinski PU, david hill mgi,
1998 suzi lewis bop, MA, mike cherry, judy


=== Ontology development ===


== Ontology Development 1 - Midori Harris ==


  2. Quick review of any outstanding Action Items from Jan 2007GOC meeting held in Cambridge, UK http://www.geneontology.org/GO.meetings.shtml?all#consort
All content meeting related changes documented on [[Ontology_Development|Ontology Development Wiki]]


is_a complete was almost finished last meeting, but is now done and a system is in place to make sure it remains so.


delayed until end of morning before lunch
:Three high level terms need to be disjoint – cellular process, multicellular organism process and multi-organism process. General notes on is_a complete: [[Isa-complete_BP]]


[edit] Progress Reports
Topics also overviewed; priority list on main Ontology Development wiki page.


Content meetings have been held for:
* [[Cardiovascular physiology/development|Cardiovascular Physiology]]
* [[Muscle Biology]]


Other topics:
* [[Transporters|Transporter activities]] extensive work via web conferences
* Medium-scale content changes:
** synaptic plasticity
** [[RNA processing]]


Michael Ashburner: Question IMG-to-GO and FIGS-to-GO mappings.


2007 Progress Report for NHGRI due Jan. 1, 2008
:Jen, Midori: the IMG to GO mapping is mostly finished. These items are waiting for Jane to return.


Chris M: mappings between the BP and MF terms still need to be done.


suzi intro of reports. Isa complete done-ish still some work.
JB/SL: wiki is a valuable resource, however it can get muddled sometimes managers should keep track.  


Alex: if you add new large section you should send out a general email.


        These reports will review accomplishments to date.  
'''ACTION ITEM – tutorial on wiki discipline (assigned to Jim Hu).'''
        We are using the itemized list of sub-aims from the grant to organize these


Aim1: We will maintain comprehensive, logically rigorous and biologically accurate ontologies.
Rex – in addition, there could be a group of wiki experts formed, who people could contact for advice.


Midori, David. Using wiki, everything they do on phone etc minuted on wiki (partially an experiment): 
== Ontology Development 2 - David Hill ==


http://gocwiki.geneontology.org/index.php/Ontology_Development
=== 1) Taxon and sensu. ===


Barely finished isa complete at last meeting, but is now done and a system is in place to make sure it remains so.
“Sensu” confused users and curators, and editors became lazy in its implementation and accurate definitions were not created. Sensu terms have been renamed, merged or obsoleted (how many?) in collaboration with domain experts.


David will speak on his collab with Harvard and MIT: Visualization theory, working on structure improvements and specificity of parentage.
:Note added after meeting: We would have to run obodiff to get counts for renamed vs. merged vs. obsolete, but we started in April with 229 'sensu' terms, and there are now 80 remaining in the live file. Of these, several are sorted out in the 'fruiting_body.obo' file in go/scratch/, and the remainder (about 60) are listed on the last [[Meeting_Notes_3|sensu meeting notes page]].


Want 3 high level terms disjoint – cellular process, ??? process and ??? – Details on wiki.
Definitions now need to state how a process occurs differently in the different organisms. If it is impossible to state this, then child terms will not be created. In future, term requests need to include reasons how a process occurs differently in different organisms.


Terms applicable to only one taxa… we are moving away from the sensu system.
Synonyms containing the sensu information are kept for these terms.


Midori generally discussing the Ontology Development wiki pages linked above.
The general consensus at the meeting seemed to be that rather than create long convoluted term names, we would still be allowed to create a term such as plant-type vasculature as long as the definition clearly differentiated the terms.


IMG mapping. How much hanging now Jane gone on maternity? Not fixed!
==== Function-Process Links ====


JB/SL wiki can get muddled sometimes – managers should keep on top of this.  Alex – if you add new large section you should send out a general email.
Chris M: these mappings are complex


SL Action – tutorial on wiki. Rex – wiki help contact group.
Waiting for OBO-Edit 2.0 for help on cross-products.


David:
=== 2) Regulation. ===


Taxon and sensu: Sensu confused users. Asked experts in different areas to review their sensu terms. 60 left to differentiate.
[[Regulation Main Page]]


Function - Process links. Not much done, waiting for new OBO (because waiting for cross products) and mappings not as straightforward as we first thought.
GO will soon add a new relationship, 'regulates'. Regulation-of-process terms will then be changed from part_of the process to regulates (for example, 'regulation of metabolism part_of metabolism' will become 'regulation of metabolism regulates metabolism').


Regulation. Add new relationship – ‘regulates’. 
During the is_a-complete work, three top-level regulation terms were added to represent three categories of biological regulation: regulation of molecular function, regulation of biological process, regulation of biological quality.


David Hill speaks too fast but Emily doing great job on this one.
Chris has generated a report (go/scratch/regulation-of-non-process.txt) of all descendants of 'regulation of biological process' where there is no term for the process being regulated. David Hill is going through the report (not as bad a task as he'd feared), and has found that the violations fall into three categories, corresponding to the three parts of the [[Regulation Worksheet]]:
Chris went through ontology and told them were things did not make sense. All analyses in CVS in /go/scratch directory regulation-of-non-process.txt and on the wiki they have been split into 3 distinct categories detailed here:


http://gocwiki.geneontology.org/index.php/Regulation_Worksheet
:[[Part 1]]: The regulation term describes regulation of a molecular function or a biological quality, so the term is o.k.


Part 2:
:[[Part 2]]: The regulation term is a legitimate subtype of its parent, but a more specific process term isn't required. Example:


How are we going to relate these terms to the rest of the ontology?
::GO has 'regulation of transcription involved in forebrain patterning' and 'regulation of transcription', but not 'transcription involved in forebrain patterning'. 'Regulation of transcription involved in forebrain patterning' is


Action Item: need to trash through regulation.
::* Part_of forebrain patterning (check)
::* Is_a regulation of transcription (check)
::* 'Transcription of forebrain patterning' is not necessary -- it is essentially the same process as transcription
This term will inherit the regulates relationship through its is_a parent and will regulate 'transcription'. It will remain a part_of forebrain patterning since every instance of this process is a part_of an instance of forebrain patterning.


Part 3 are problem cases. Including typos! Search on this page for ‘?’ for proposals by David where he needs resolution. Going through this list created links between function and process ☺
:[[Part 1]]: Actual problems of various kinds; David has made suggestions about how to handle them, which everyone should check -- especially the ones with question marks.


Negative and positive regulation question – were subtypes. Blood pressure it made no sense. Thought better to be part_of relationships. Keep it as subtypes for BP.


Collaboration with MIT/Harvard group.
Chris Mungall: there are problems with cross-products, and would be easier if the parent terms did exist.


Terms in wrong place? Shown through with their content analysis tool.  Information Content Analysis. Can we place these terms in a better place – most of the time the answer was, yes. For example, pilus retraction moved further down.
David H: this will be resolved once the parent terms do exist.


Possible to put this analysis into GOC tools? CM – analysis already in database – check with Chris what he means.
David H: concern about consistency in regulates relationships. In some cases, negative and positive regulation of a process are part_of the parent and in some cases they are is_a of the parent. We need to be consistent about this.
For now, negative and positive regulation of a biological quality are a special case. When you are regulating a biological quality, the regulation is a balance of the positive and negative processes. Therefore, the positive and negative children are part_of the regulation of the quality.
Suggestion to use homeostasis terms for overall regulation of biological quality [midori]


Ontology Structure – CJM
CM: will look at relationships between cell types and GO terms: use as a guide to populate GO with missing terms.


Wiki should be merged with Ontology Development


http://gocwiki.geneontology.org/index.php/Ontology_Structure
Q. VW: How existing annotations are affected by relationships change
Eg  transcription intiation. may have annotated more granularly to regulation of transcription initiation when there is direct involvement. Topic for annotation discussion at some point?


Mining Reactome links to link process to function – more after lunch.
A (David, Midori): The 'regulates' relationship shouldn't affect annotations. Basically the part_of relationships already exist and we will simply replace it with regulates. We are already annotating to regulates terms and it shouldn't change. What will be different is how we process annotations. We will be able to decide whether or not we should include regulates children.


Internal cross products can start to be created and maintained in the ontology. OBO-Edit 2.0 will make it easier to maintain these cross products.
'''ACTION ITEM (ALL) Look at and comment on outstanding items (search on ?)'''
Features ontology repair tool for links that don’t exist or are broken. Need consistent rules for regulation terms where the regulated stand in part_of relations.


New cross product guide on wiki. Links to ongoing work on BP – CP cross products;
'''ACTION ITEM Check whether there should be a relationship between pigment metabolic process and pigmentation'''


http://gocwiki.geneontology.org/index.php/Cell_cross-products
=== 3) Information content analysis. Collaboration with MIT/Harvard group. ===


MIT and Harvard got in contact with GO, were interested in measuring information content of a GO term.
They looked at the number of annotations to a term related to its position in the ontology.


They developed a statistical algorithm to determine information content based on the assumption that if  not many genes  are annotated to a term it has a  high information content and a term with lots of gene products annotated has a low information content


Karen Eilbeck SO progress
David, Midori and Jane then looked  for outliers with respect to information content (finding terms that were either too specific, at a higher or lower level than they should be)


Development : March – august joined J Thornton group - Gabi reeves for Biosapiens project. 96 new terms to SO.  
Took higher level terms which had too few annotations compared to other things the same distance from the root and looked if they could be relocated. e.g pilus retraction  was a direct child of 'cell physiological process' and was relocated to pilus organisation and biogenesis, so that it was at an appropriate level in the GO hierarchy.


Mark Hathon (with Barry)– ongoing work.
Similarly lots of specific terms had a larger than expected number of annotation eg. Olfactory receptor activity


Content meeting in June, HLA immunology community – looking for terms to describe variants. New terms, rearranging of SO – very productive.  
Some of the annotation distributions between terms also just reflected biological differences e.g. cation and anion transport terms: there are more cation transporter genes than anion transporters, the two terms are at the same level in the ontology - as they should be.
Therefore this analysis can only draw attention to particular parts of the ontology which a curator then can examine.


Collaboration with phyGo.
Q:  JDR Is it possible to put this analysis into GOC tools?


Working on synonyms with Colin Batchelor.  
A:  CM – the analysis is already in database, can be used.


Release SO every 2 months.  
AD: this is something which can be repeated semi-regularly, but not to dwell on too much.


Karen dropping down to 60%.
DH: this has beeen a very good collaboration experience, and had produced good contacts to continue relationships with.
SR: We know of other groups that could also get in touch which are interested in this area as well - will get in touch with David Hill.
JB: annotations give power to these kinds of approaches. And until we have an annotation core we are restricting this kind of potential activity.


== Ontology Development 3 - Chris Mungall ==


COFFEE BREAK
Wiki for ontology structure (should be merged with Ontology Development)


http://gocwiki.geneontology.org/index.php/Ontology_Structure


Aim2: We will comprehensively annotate reference genomes in as complete detail as possible.


Reference Genome Project
1. Mining Reactome links to link process to function – more after lunch.


Rex/Pascale


Provide comprehensive annotation for 12 genomes. MOD, genome DB, curators. Complete means breadth and depth. Breadth – every gene. Depth – to the highest possible knowledge. If small amount of papers then read all. If extensive then summary of reviews.
2. Internal cross products can start to be created and maintained in the ontology. OBO-Edit 2.0 will make it easier to maintain these cross products.  
Metrics. GET REX’s PRESENTATION.


250 genes identified for curation. Gene when mutated should contribute to a disease (OMIM).
New cross product guide on wiki. Links to ongoing work on BP – CP cross products;


Per MOD – curators responsible for identifying orthologs using the commonly available tools.
e.g. could link histone deacetylase complex to histone deacetylase activity (this type of linking is easier than creating BF to MF links)


S/W – Google spreadsheets – erratic. Not robust. Anxious to work with the SW group to develop a database – requirements have been written up. Merchant (left in July) wrote prototype.  A new member of staff is starting at the end of September to continue development.


Annotation Progress – see slide.
http://gocwiki.geneontology.org/index.php/Cell_cross-products


Display approaches – comparing annotations to generic GOSlim branches.
Includes:


Number of sourceforge requests from reference genome group in the hundreds over 16 months. Average of 10-12 requests per month. GO editorial group doing a good job at keeping up with these. Existing requests are problematic. 411 terms.
Internal links (existing)


External links (function to process links)


Aim3: We will support annotation across all organisms.
External links (x products)


Annotation Outreach – Jen Deegan
Links need to  be treated with caution. Links are kept in a file separate to GO at the moment, as people could make erroneous propagation of annotaitons between the Gene Ontologies (i.e. just because someone annotates to a certain process, it does not mean they should necessarily annotate to the linked function).


Keeping track of new groups annotating and writing documentation.
3. contributes_to


ASK JEN FOR HER SLIDES.
- people are using this qualifier incorrectly in annoations.
VW: take Histone Deacetylase complex as an example, this is a very large complex with many molecular functions. Therefore one complex can be linked to many different functions. We should use contributes_to *only* in those instances where the annotator does not know which subunit provides a function.
JB: no, contributes_to can be used also when you *do* know the individual contributions of subunits.
MD: often subunits which do not have a specific activity themselves are involved in enabling another subunit providing the activity.
VW: but this does not hold ofr all complexes, we are using this qualifier in too many different ways.
DH: often, if a subunit is knocked-out, the observer cannot tell if the subunit has a direct or indirect influence on the resulting phenotype. Therefore in addition often the 'contributes_to' qualifier is missing.
: discussion postponed


People going to meeting – report back gossip from willing people to Jen.


The SOPs have been tricky but are now on the public GOC website:
Internal cross-products
If cross-products were maintained in the GO directly, it would make life easier. Cross-products will be more manageable in OBO-Edit 2.0 where there are many features - can use a 'Cross-Product Matrix Editor' - can see the possible cross-product/GO combinations, parents and children of a term.
- this helps identify missing links in the DAG.
- in addition, there will be an ontology repair option, which can introduce these links, e.g. missing is_a links.
DH: we want to use this to go through the logic of the regulates relationship - as concern about ensuring consistency.
CM: will also look at relationships bettween cell types and GO terms: and can use as a guide to populate GO with such missing terms.
... more on cross-product logistics later.


http://www.geneontology.org/GO.annotation.SOP.shtml
=== Karen Eilbeck SO Progress ===


Michelle created nice ISS guideline SOP.
Development : March–>August joined J Thornton group  - Gabi Reeves for BioSapiens project on protein terms, 96 new terms for polypeptides have now been  added to SO.  


Action Jen: A reference to these pages should go in next newsletter.
Mark Hathon (with Barry Smith)BioSmith, Buffalo – ongoing work on regulatory regions.


There has been much progress on grants.  
Content meeting in June, HLA immunology community – looking for terms to describe variants. Added new terms, rearranging of SO – very productive.
Assigned work to Alex, nothing to report.  


Attending many regular conferences.
Collaboration with Arian at phyGo. Mobile genetic elements for viruses. This is in parallel with work happening in GO.


Less cold calling, it wasn’t very successful. More luck tracking down the right person at conferences. Responding to invitations.
Working on synonyms with Colin Batchelor, and over 400 new synonyms have been added to SO.


Release SO now every 2 months. Therefore there is a stable and leading-edge version for those interested.


Eurie - User Advocacy
Changing requirements for GFF3 - this not done yet.


Focusing on lines of communication, newsletter and mailing list. Rota of mailing list monitors.
Karen dropping down to 60% on this project.


Newsletters archived. Future news items page on wiki. Michael A wants ISSN for the newsletter. Action Michael! Michael sent URL to Eurie – Action Eurie!


Users meetings, we have a page of potential meetings on wiki.
'''COFFEE BREAK'''


Tools standards.


Production Systems – Ben Hitz
Aim2: We will comprehensively annotate reference genomes in as complete detail as possible.


Deployed 4 nesw linux machines 1 for loading, 2 for amigo poduction, 1 amigo development.


ASK BEN FOR SLIDES
=== [[Reference Genome Annotation Project]] - Rex Chisholm ===


Production Amigo now more fault resistant.
Aim3: We will support annotation across all organisms.


Go Database loading speeded up and now in testing.


Godb sequences – using gp2protein files.  If possible do all sequences in your DB, not just annotated.
[[Image:ReferenceGenomes GOC PU 2007final.ppt]]


Assocdb fasta file – Header line massive – can be slimmed down?
Purpose: to provide comprehensive,  robust collection of annotations for 12 genomes.
These genomes have the most published data, have a genome database and experienced GO annotators. These high-quality annotations will be a resource for other groups to transfer to genes in their species.


Association file cleaning – All IEAs must have a with field.
Complete/comprehensive annotation includes measures of breadth and depth.  


Amigo Amelia
Breadth every gene annotated.
ASK AMELIA FOR SLIDES


Term enrichment.
Depth – gene annotated to the highest possible knowledge.  
If there are only a small amount of papers (5-10) then the curator should read all.
If extensive then the curator should be selective, completion best assessed by a curator)


Go slimmer.
== Target Gene Identification (Priority genes) ==


250 genes have now been targeted for curation.
The target method has now been changed, targets are now (as of last month) selected based on disease type. Gene when mutated should contribute centrally to a disease phenotype(OMIM).
This method has been generally successful, however there is now a challenge for mammalian groups with the increased literature load. Also a challenge for non-mammals - orthologs may not always be available (e.g. neurological genes in yeast). These challenges need to be balanced.
== Ortholog Identification ==


Need to have a good set of orthologs.


LUNCH BREAK
Need to find ways of facilitating this work through tools, no obvious choice as yet. e.g. InParanoid have problems in keeping pace and providing up-to-date sets.
Would be good to have a ortholog set automatically provided which curators could then validate.


Software


Action Items Review
Currently use Google spreadsheets for target lists and information on curation progress.
However this is not robust enough and time consuming.
A database will be developed to handle this data and requirements have been written up. This will mean that the Ref Genome data is more structured. The database will provide a consistent use of identifiers, MOD association file loading, tracking when no ortholog found, and an automated response if a paper appears after a 'comprehensive date'.
Sohel Merchant (left in July) wrote prototype
<< ADD URL>>.
A new member of staff will start at the end of September to continue development.


4. IN PROGRESS Action item for Michelle – get this info [i.e., on TIGR course attendance and whether any participants use GO afterwards -mah] We just got our Annotation Engine grant funded and as part of that we will be surveying people who attended the classes.


Check on attendees if they use go afterwards.
== Metrics ==


Annotation Progress – see slide.


8. Action Item: GO-top: Talk with each manager and bring summary reports to the next meeting.
Annotation Consistency.


Managers phone call


10. NOT DONE. Action Item for John: Add a term creation date to the .obo file. It didn't get added to the feature request tracker so it was overlooked. Now added:
Mary Dolan's  tool for comparing annotations by looking at generic GOSlim branches useful as different organisms are used in different experimental approaches and different levels of data are available in different organisms.
Eurie: if there is an outlier in annotation consistency checks this might also indicate organism-specific data (e.g. chromatin silencing not appropriate term for yeast).


added to tracker on plane. DONE
Table View  (slim showing each terms annotated for a gene) includes every term useful for curation and annotation consistency (add link???).


12.In progress. Action item for Midori and David (from 2): Find a way to communicate responsibilities and availability viz ontology working groups. Devise a systematic way to bring closure to outstanding term request items. We've made some plans, but not yet acted upon.
Ontology Development
We have begun implementing our plans (and refining them, as needed); a respectable number of SF items have been assigned, though for others we still need to find expertise. I think of this as an ongoing ontology development effort rather than a discrete one-off action item. [mah]
Aim to have robust discussions on annotation and ontoloo9gy development issues. Number of sourceforge requests from reference genome group in the hundreds over 16 months. There is an average of 10-12 SourceForge requests per month. GO editorial group doing a good job at keeping up with these. Existing requests are problematic. 411 terms.
- It is important that curators label their SourceForge request as relating to a Reference Genome group.


Done for purpose of meeting. Ongoing inherent part of ontology development,
MH: Can retrieve number of GO terms that have resulted from these requests by looking at the cross-references file: 411 terms from Reference Genome-marked reqests.


13. NOT DONE: Action item for AmiGO Working Group (from 4): The AmiGO Working Group will implement a strategy to incorporate and display the contents of the GO references. Request is on the AmiGO tracker (thanks Jane) http://sourceforge.net/tracker/index.php?func=detail&aid=1667315&group_id=36855&atid=494390
Ruth Lovering's Metrics Document v3: [[Image:HowToCaptureMetrics3.doc]]


Done in next release. Amelia. On the tracker still. IN PROGRESS


15. Action Item for GOA (from 1): Talk to Reactome about getting non-TAS evidence. TAS is no longer considered a useful evidence code and will not be used in any consistency measures of reference genome annotation. Since part of the idea of the reference genomes is to provide a source of IEA annotations for other groups, we strongly encourage reference genome annotators to not use TAS, and instead use experimental evidence codes whenever possible. The GO documentation should also state this in a clear fashion. [annotation camp participants]
Publicising
- need to start publicizing Reference Genome work.


IN PROGESS NEXT ON AGENDA


16. Action Item for Judy, Harold, Amelia, Eurie, John Day-Richter (from 3,4,5): Resolve communication issues around obsoletes.
== Annotation Outreach – Jen Deegan ==


Midori to look into but probably done – tags applied to obsolete terms in OBO edit file.
Aim: to find new groups to join the GOC annotation effort, and keeping track of new groups annotating and writing documentation to help get groups started.  


17. NOT DONE (more discussion required) Action Item for Chris and Jane (Revisit 7): Remove the word 'activity' from the molecular function terms, and consider renaming the molecular function ontology. [note that later on in the meeting after action item 73 notes state that more discussion on this proposal is needed]
see wiki:
http://gocwiki.geneontology.org/index.php/Annotation_Outreach_group_reports


Strike activity from the function terms but some people very opposed.
[[Media:outreach_princeton.ppt]]


18. [In Progress] Action Item for Jen & Chris (from 8): Assign priorities for contents changes needed to implement sensu plan. Discuss the different aspects of changes to our use of sensu, write documentation on this, and implement the new strategy. This change will then be announced to the community.
Jen described the scope and techniques of outreach effort. Showed an 'ontology ' of outreach effort. There has been much progress on grants.  
Comment: There is more discussion to be had on this and a slot has been assigned for this in the meeting.


Jen to update on sensu later in meeting. IN PROGRESS
Attending many regular conferences.


Less cold calling, it wasn’t very successful. More luck tracking down the right person at conferences. Responding to invitations.


20. Action Item for Karen C. (from 11): Add to the documentation that it is okay to use experimental evidence codes for perfectly identical gene products from different strains of the same species.


IN PROGRESS.  
People going to meeting – report back information from willing people to Jen.


21. Action Item for Eurie (Revisit 12): No conclusion about how to distinguish large- vs small-scale experiments was reached. People are encouraged to keep thinking about this issue, which clearly needs more discussion.
The SOPs have been tricky but are now on the public GOC website:


To be discussed at this meeting.
http://www.geneontology.org/GO.annotation.SOP.shtml
MA: this page is difficult to find.
Action: this page needs to be reviewed and included in the next newsletter


26. Action Item for Karen E.(?) & Rex: –Work out kinks in getting this data [i.e., GFF3 files] and make it readily available from the GO site.
'''ACTION ITEM Jen: A reference to these pages should go in next newsletter.'''


IN PROGRESS.
'''ACTION ITEM Jen Add a link from outreach to the SOPs)'''


28. Action item (Suzi, Tanya): Continue developing [protein family-based annotation] tool.
There has been funding success - for the British Heart Foundation and AgBase grants.  


Put aside for the moment. Need to revisit. EVOLVING
MA: for new groups annotating, how many SourceForge requests are we getting? e.g. Aspergillus group should have requested new terms?<br>
SL: agree. As soon as an annotation effort really has started, the group often needs a number of new terms.<br>


29. Action item (Midori and Rex): Do it [i.e., new column in annotation files], add the column and document the formalism.
Jen: for emerging genomes the main problem is finding funding to support an annotation effort.<br>
It's not at all clear why this has my name (and I don't remember); it seems more for the software group and annotators. [mah]
MC: need to determine if they are only doing IEA annotation, or whether they have the time to carry out manual curation.<br>
JH: the process of making new term requests is not obvious<br>
JW: the SourceForge term tracker only goes to the GO list, so other groups not aware<br>
MH: it is possible to add more e-mail addresses to this list.<br>
MA: not our job to source funding for new groups, it is the job of the individual groups. <br>
JB: supporting new groups is important, need to mentor groups and support them submitting new terms.


TO BE DISCUSSED. Column 16. Structured Notes Field AKA SLOTS ☺
'''ACTION ITEM investigate why terms requests  aren’t coming in, do we need to do things to make it easier, SF tracker list/annotation list/ who are on these lists/ do other people need to be on those lists?'''


31. Action Item (Jane): Keep researching CARO integration.
IN PROGRESS


32. Action Item (AWG): Explore whether this [Mary's graphs] would be a useful addition to AmiGO 2.0.


STILL EXPLORING. Use logs? More discussion.
== User Advocacy - Eurie ==


33. Action Item (AWG): There should be more tracking of what users are doing in the next AmiGO; explore the options.
Focusing on lines of communication, web presence, newsletter and mailing list.  


Adding dates to file done but nothing further happened.
Different users, new users, current users, power users


Most of the past year has focused on the lines of communication.


35. Action Item (John & Jane): Come up with a proposal for a history tracking tool, including timeframe and priority, and send out email.
wiki User Advocacy main page:
http://gocwiki.geneontology.org/index.php/User_Advocacy


36. Action item (David, Jane): Organize first training meeting.(postponed due to the amount happening at this meeting)
Rota of mailing list monitors.


37. Action Item (Nicole, Jane, John, Mark): Create proposal to group for term submission software, including the “minimum standards for each term”.
Newsletters archived. Future news items page on wiki.  
Wiki or Newsletter ideas <<add  link>>


JOHN not done – proposal considered.
Michael A wants ISSN for the newsletter.  


38. NOT DONE! Action Item (AmiGO Working Group): Change AmiGO to hide (by default) structured comments of certain types. Obsolete comments will not be shown. Structured comments that don’t belong will not be shown. Have option to hide comments. [different types of comment not specified so no action]
'''ACTION ITEM NML  Michael sent URL to Eurie – Action Eurie!'''


Need more specification. Amelia.
'''Somebody mentioned RSS feed, is this a potential  action?'''


39. Action Item (Karen E.): Send the information [microarray tool evaluation] to the group
Users meetings, we have a page of potential meetings on wiki.
- used to target groups new to GO and help education.
- have a workshop specific for microarray users (rather than an add-on to MGED)


Send results but hasn’t. Probably obsolete.
Tools standards. (Needs to be cleaned up and categorised)
- ideas for minimum standards for GO tools
- send out list a month ago:
http://gocwiki.geneontology.org/index.php/Tools_standards


40. Action Item (User Advocacy): Create a Wiki page for each tool on GO tools. GO's official tool recommendations (if we have them) can be posted on the Wiki. Link to a Google user group for tool discussions
== Production Systems - Ben Hitz ==


41. Action Item (User Advocacy): Publish minimum standards for tools on GO tools. This can just extend the existing questionnaire.
[[Image:ProductionReport_GOC_PU_2007.ppt]]


IP
Deployed 4 new linux machines 1 for loading, 2 for AmiGO production, 1 AmiGO development.


42. Action Item (Everyone): Email suggestions for meetings at which we'll do outreach to Eurie & Jane & Jen.
Production AmiGO now more fault resistant.
[Ongoing - working well]
ACTION: e-mail Ben if you are not getting a gp2protein check for your database.


IP
Go Database loading speeded up and now in testing.


44. Action Item (Eurie & Jane): Create Wiki of meetings gathered from emails.
Godb sequences – using gp2protein files. If possible do all sequences in your DB, not just annotated.


IP
Assocdb fasta file – Header line massive – can be slimmed down?


45. Action Items (Managers): Revisit metrics & priorities.
Association file cleaning – All IEAs must have a with field.


ONGOING
== AmiGO – Amelia ==


46. Action Item (Managers): Create structure within working group Wikis to allow people to announce what they are working on.
AmiGO enhancements and new search features demo
Ontology development started, and ongoing; as "done" as it can be. [mah]


ONGOING
*Search result relevance implemented - most 'relevant' results are shown first
*Term and gene product search is now "intelligent" and AmiGO will automatically search all fields if it doesn't find a match.
*Term enrichment (also known as "GO Term Finder") and GO Slimmer (map2slim) functionality have been added to AmiGO. Both can use uploaded user files or data from the GO database.
*Downloads in OBO, RDF-XML and gene association format now possible


'''LUNCH BREAK'''


47. Action Item (Everyone): Create list of projects they are working on, post on Wiki.
== Action Items Review ==
Ontology development done. [mah]


This large section moved to it's own page:


ONGOING
[[Outstanding Action Items from 17th GOC Meeting, Cambridge UK]]


49. Action Item (Chris, Karen C.): Publish a succinct paragraph explaining why unknowns were removed. Karen C. will write documentation targeting biologists; Chris writes an explanation targeting software engineers. Publish on website, possibly in newsletter, wherever necessary.
Afternoon, Sunday, Sept 23, 2007


== Reactome - Peter D’Eustachio ==


done
[[Image:Reactome_to_GO_GOC_PU_2007.ppt]]
Items 50-52 are about the WITH field for the IPI code.
50. Action Item (MGI, SGD): Change all pipes to commas.


DONE – DAN AND DAVID TO WRITE ON WIKI ACTION ME
Reactome can provide data to proteins that UniProt does not yet have manual annotations for most of this Reactome data is derived from experimental evidence identified from papers however unlike the GO annotation method, the types of experiments have not been recorded.


51. Action Item (Chris, Ben): Work through any database implications.
Emily: GOA would love this data, but unless have a new parent ‘Experimental’ code, the best that exists is ‘TAS’.


???
Suzi Lewis: there is a use for a hierarchy of evidence codes. With an  ‘E’ Experimental code as a parent of the IMP, IGI, IDA, IPI, IEP granular codes.


Peter: Homolog sets used to transfer data between species is determined by individual experts, and transfer between orthologs AND homologs (where functionally similar)
Judy and Suzi: Reactome data is valuable. It is unacceptable to not be including it in GO and it is unacceptable that this data should have anything less than an experimental evidence code. TAS or NAS evidenced data are unacceptable also.


52. Action Item (User Advocacy): Announce in newsletter, “what's new” section of GO site.
Peter: current Reactome curation methods is to avoid unpublished data and Reactome curators also want to be opinionated about the published data, to the end that Reactome will reflect current expert opinion, and avoiding hypothetical theories. Only confirmed, accepted knowledge is included. There are 10 curators, only 2 of whom have previous experience in GO annotation, there is no budget to do GO annotation and no desire to teach curators about GO evidence codes. Don’t always know which piece of literature applies to which info.
2000 gene annotated. 4000 pieces of literature. It is not clear how many GO annotations this would convert to.


????
Suzi Lewis:  This brings up the question of what is the purpose of evidence codes? Why do we have the ones we have? Do users use them? (something to discuss tomorrow).


53. Action Item (Everyone) – Tell code extension advocates to piss off. Attendees voiced strong opinions against extending evidence codes.
Pascale: have evidence from users that they do care whether IDA or IMP codes are used.


ONGOING
Peter: There is not always a 1 GO term to 1 publication relationship. Sometimes a GO term may have originated from the combined curation of many papers.


55. Action item (Karen C): makes these statements very explicit in the documentation.
Eurie and John Day-Richter: TAS annotations are valuable, and may be good to get the data in.


DONE
Suzi, Judy: this data is too good for TAS.


56. Action Item (Rex, Eurie): Create a gene page for each reference gene on Wiki. Use Alex's pages as a template. Link Wiki pages to existing annotations.
Emily D: Why not use a mix of codes depending on the GO term to publication ratio? For those instances where there is a 1:1 relationship of GO term to publication: use ‘E’, for 1 GO term to many publications: use ‘TAS’ and cite the Reactome reaction web page as the source – this then acts as the reviewed document.


DONE
David Hill: concerned about the proposition of a new ‘Experimental’ evidence code: might loose analytic power.


60. Action Item (Evidence Code Working Group): Develop a decision tree for choosing an evidence code. Present results at next consortium meeting. (large scale/small scale, reviewed/un-reviewed. Similarity/not-similarity etc).
Judy B: could Reactome curators go back and re-annotate those 4,000 papers and convert the codes to one of the GO experimental codes? This would only take 2 weeks to do.


TO BE DISCUSSED AT THIS MEETING
Peter: Not possible – Reactome have defined goals, we cannot afford to reannotate for GO. 75 genes/month is the absolute minimum annotations. We have our own grant objectives we must fulfill.


61. Action Item (Rex): Assign programmer to check integrity of WITH field for annotations.
David Hill: GO curators could prioritize the reannotation of genes for which there is not much annotation available.


NEED NEW PROGRAMMER
Rex: could the reference genome groups each take on a subset of annotations and re-annotate?


62. Action Item (Everyone): Add SwissProt keyword ids to WITH column when appropriate.(MGI-DONE, GOA-DONE, ZFIN-DONE)
Emily: then the annotation would belong to the group that reannotated. We would be using Reactome data as a source, but the final annotations would be attributed to the group that provided the final annotations. Might not be the best use of resources.


DONE
Suzi : Would accept ‘EXP’ for the 1:1 mapping of GO term to publication.


63. Action Item (Evidence Code Working Group): Send RCA code back to committee.


TO BE DISCUSSED AT THIS MEETING
Q Val: Any idea how many aren’t covered by GO annotation already?
A. No…




74. Action Item (Chris, Jane, Harold): Create concrete proposal for linking between all GO ontologies, get relationship ontology up to speed, do pilot project in metabolism.
Judy, Sue R, Emily D, Tanya B: the ‘EXP’ code would make life easier for users, for other integrations as well


ONGOING
'''ACTION: Reactome annotations should be available from GO by the next GO Consortium meeting. Chris, Alex, Jen and Ruth to be responsible.  Add new evidence code EXP for 1:1 Reactome  to literature, add all other Reactome with TAS to Reactome source.'''


Arguments for structuring evidence codes
i) make things simpler
ii) allow incorporation of other date
iii) needn't change our current usage
iv) do the TAS for the things that don’t fall under EE that can’t be assigned to a single paper.


76. Action Item (Suzi): Come up with a structure for reporting our time commitments.
(continue tomorrow the discussions of the point of evidence codes and the possibility of new parent ‘EXP’ code)


ONGOING


78. Action Item (User Advocacy): Draft an official GO brochure.
=== Protein Complexes: GO vs/ Reactome ===


TO BE DONE
Reactome complexes  are seen as an entity, (i.e. a collection ofo proteins) whereas GO treats complexes as a subcellular location
However there is also a blurring between the two for Reactome, especially when looking at large complexes.
Peter: In our annotations, a cross-reference slot allows us to cite a GO identifier for the location (usually to the parent term of the complex). Reactome curators add the cc term that is most granular, and willing to generate SourceForge request for those missing


79. NOT DONE, can do this when needed. Action Item (Mike): Create GO banners.
Judy B: talked to Lisa in Bar Harbor on complexes for Reactome. Concern about the active function tag to the active polypeptide.  


ONGOING
Peter: for a catalysis – any physical entity in a complex is given a GO term describing the activity, however the active unit, which mediates the reaction is labeled by Reactome. Can parse out which of the polypeptides had the catalysis functions and which are just associated – in most cases this is identified by experimental data.
Although Reactome does not always search for the most granular Biological Process GO term, these haven’t been applied consistently.


David Hill: there should be no problem mapping this data from Reactome, while the concepts in GO and Reactome are not equivalents this is not a problem as GO would annotate the same gene products as Reactome would.


Peter: Ewan did have a concern about the ‘contributes_to’ qualifier – concerned  that a significant number of end users would not always be aware of use contributes_to. But really this is the users problem. And they can strip out if necessary.


81. Action Item (Everyone): Use the word cellular to refer to things related to a cell but aren't necessarily cells or things contained within a cell. Use the word cell to refer to cells themselves or things contained within a cell. (Change cell signaling to cellular signaling, for example).
Jennifer: users have suggested that GO could strip out annotations which use the contributes_to column (especially the NOT annotations) and these then could be provided as a separate file. As these can be dangerous to ignore.


ONGOING


'''ACTION ITEM convert Reactome complex terms to GO terms'''


== ‘Taxon and GO’ - Jen Deegan ==


[[Image:Taxon_and_GO_GOC_PU_2007.ppt]] (using paper from Waclaw Kusnierczyk)


82. Action Item (Jane): Tweak cellular component definition to clearly distinguish cellular component from cell part.
Originally Chris and Jen worked to loose sensu  tags and redefining definitions  and adding taxon links
- However removal of taxon has been a problem. There are now 23,802 terms. Searching for terms is a  time sink for users,
- GO help has often received  queries from users asking if there is a taxon-specific GO slim/subset of  terms (e.g. plant-specific GO)


ONGOING
- In addition, Jen as outreach officer has found new MOD groups are unwilling to annotate to GO unless there is a slim available for them.


- GO language can be subtle. GO term names can now be complex now the sensu information has been removed. This would make GO terms easier to find and decipher.


- In addition, having taxon information in the GO helps error checking


Afternoon, Sunday, Sept 23, 2007
- There are 3 types of relationships that could be applied to relate taxon to GO terms:
Discuss protein complexes and the intersection with Reactome annotations
1. Is_relevant_to
` 2. is_only_in
2. applies_to_all


Peter D'Eustachio (Reactome) will be able to join us only for the afternoon.
This taxon-specific information would be added into a separate file.


GET SLIDES FROM PETER
Discussion:


These are some other topics that Ewan Birney brought up at the Interactome meeting:
Judy: Against including taxon information within the GO as we do not know all properties of a  taxon.  Taxonomic information is in flux also, we do not want a dependence on taxonomy in GO. We would be restricting ourselves if we did not make all terms available to all users.
Could not instead users look at the terms that were used by a reference genome group to see what terms are appropriate for a particular taxon?


    * Reactome has a lot of function annotations with the connections to the complex that has the function. These are for very small complexes like homodimers. He says can we accept these? Also he says that he understands that this is a bit granular for us and that we have a lot of similar information for larger complexes. He suggests that there might be better way to store all the data for the functions of all sizes of complexes if we pooled out data and thought of a good system to make it available.  
- general disagreement from curators of this possibility.


    * Ewan Birney and the people at the Interactome meeting felt strongly that the more complicated annotations such as those with a NOT or colocalizes_with qualifier, and those with annotations to the root should be in a separate file from the straightforward annotations. The users felt that many people do not know about these more complicated annotations and it would be better to make them an added extra that people must specifically go to download. They were particularly concerned that users may not know that it is important to parse the qualifier column and so may miss vital information.
Agreement that there are incorrect annotations which relate to taxon-specific properties:
Harold: in the Fantom load – needed to remove incorrect mouse annotations


• Reactome would like to have a new evidence code to show where an IEA has been transferred from a known orthologue. Such transfers tend to produce more granular annotations. (This has been discussed before but Ewan asked me to bring it up while Peter is there to back the proposal.)
Val, Harold: InterPro2GO throw out problems. These could be identified by this method.


Val: I perform monthly checks to ensure no inappropriate terms have come in at high level.  This is time consuming, and this would help.


Action: Get Reactome data into GOC before next meeting.
Pascale: would help sanity check annotation data


Val: this species information doesn’t need to be comprehensive to be useful for annotation checks


Jen – taxon and GO
Eurie: if this would help annotators, this information could be built into an annotation tool?


JEN IS DOING THE MINUTES FOR THIS ONE
Ruth: there are interesting concepts here, but does it  need to be so complicated, would all taxons need to be included. Could we not instead just use just 10 high-level taxon identifiers.


Comments:
Judy: Instead, could not rulebase triggers be used Efforts should be on annotation of literature rather than waste a considerable amount of time incorporating taxon information. We do not want to commit such a level resources to such a project especially as budgets are stretched presently.  Again, concern about fluidity of taxon-specific information


DH: useful for ipro2go annotations
Sue Rhee: we should explore usage of GO slims.
Suzi Lewis: there are risks in this kind of project, and concerned that this project would entail quite a bit of work and could also be misunderstanding by users.  Can we have a low-key evaluation.


VW: ditto
JDR: a large-scale activity of this – is a bad idea. You would propagate garbage by accepting all annotations. Could use as  just a framework by only using 10 top taxon id. – this would already help find problems.
JC – agreed.


EH: helps annotators, not necessarily good for users, but helpful
Alex D: Isn’t this just a user education problem? Users need to take the time to understand the GO hierarchy, that you can search synonyms, definitions etc. Feel that user queries are symptoms of users not trying hard enough to work with GO.
for new groups?


RL: does it really need to be that complicated?
Mike Cherry: could not afford to make this a big project, there are other developments in GO which need to be addressed


JB: could be accomplished by triggers, a rulebase. Therefore by definition this would only be seen in a taxa. Efforts should be on annotation of literature. One off triggers.
Rex: Had concern about making taxon-specific assertions that are flawed.
If  these types of sanity checks or limits were automatically applied, we would loose the potential value of not looking into these, however this data would probably tell us something fundamental about biology, and loose the ability to investigate these.


Sue Rhee: Should explore using goslims? Possibly create a file of cross products to taxa on terms, therefore multiple taxa per terms.
Judy: classifications of taxon are based on phenotypes and not molecular data and many things are being found and taxons are being redefined. Prefers’ is_relevant_to’
Like the idea of flags/triggers to factiliate work, but wouldn’t automatically exlude, as this data is important.


SL: Nice to have an evaluation. Take existing annotations and say that this term has been used in this species.
Michael A: while some  taxonomy is changing e.g. in protista, it is unlikely that viridplantea or mammalian will move around so much.


JDR: large scale derivation of annotation data BAD idea – propagating the garbage. Do something on a higher level – say 10 and see what filth it uncovers.
Ben Hitz: what fraction of problems would be solved if there were cross-products to taxonomy were included?


MC: Too much work. Another piece of SW to be built.
Jen Deegan: it would solve some, it would help with the development terms.


VW: relevant_to and not_releavant_to would be a minimum
Ben Hitz: what would the time line be for taxon cross-products?


RC: no objection to relevant_to but opposed to universals – i.e. is_only_relevant_to.  
Chris Mungall: this is much  further down the line.  


Missed judy and Michael here….
Judy. Our main issue here is how to facilitate annotations in our groups. However but we are hung up on a suggestion from outside the group.
BH: what fraction of problems would be solved if there were CPs to anatomy ontology missing?


Consensus: using this for error checking would be very useful
Chris Mungall: slims are much harder to maintain than these relationships would be.


RL: taxa isn’t a be all an end all for help in term definition
Michelle M: When the prokaryotic subset was created, she was v much against. Instead of users looking at 20,000 terms, they are now looking at 9,000 – there is not that much benefit. Don’t think new users need this, need to facilitate better ways of finding terms within the tool. For curators it might be useful for error checking, but not new users.


Consensus: Some of us have doubts and it will cost so some sleep time needed.
JDR: although there is a big concern that you’d loose annotations because of these relationships, this would not be the case as the incorrect annotations would instead be brought to your attention – and visible to better investigate/ or improve GO. the rules could be fixed.


Action: as creating new terms – see if ths is the way to go?


PG: Would not use restricted GO set to curate – what would you be missing?
Ruth: how would this data be viewed ? In addition, if a user does not understand a term then it really is a problem with the terms definition – instead the definition needs to be improved, this would be far more valuable than adding in an additional cross-link.
Jen: will be willing to carry out a small pilot version of this task in her own time. Would add 10 is_only _in relationships and use these and the annotations
to check for errors in the annotations and the ontology structure.


JDR Proposal,  
'''ACTION: Jen to do a pilot project with a minimal set of terms, as an experiment and bring back results for next GO meeting'''


Objections – unmaintainable useless and a waste of time. jen to do pilot pass to see what happens, then she presents it at next meeting
'''ACTION: (David Hill) Make difficult sensu terms organism specific (biologist intuitive)  (i.e plant vacuole, fungal vacuole). However GO definitions will still be designed to be formal, not depending on species to define the term.
'''


JB proposal – get another group to do it.
== Summary of action Items from Day 1 ==


JH: issues with true path – when going up levels, start to see untruths
# Tutorial on wiki discipline (assigned to Jim Hu ?).
# (ALL) Look at and comment on outstanding items [[Outstanding Action Items from 17th GOC Meeting, Cambridge UK]]
# Check whether there should be a relationship between pigment metabolic process and pigmentation
# Jen: A reference to these pages should go in next newsletter.
# Jen Add a link from outreach to something (SOP?)
# investigate why terms requests aren’t coming in, do we need things we need to do to make it easier, SF tracker list/annotation list/ who are on these lists/ do other people need to be on those lists?
# NML Michael sent ISSN URL to Eurie – Action Eurie!
# e-mail Ben if you are not getting a gp2protein check for your database.
# Somebody mentioned RSS feed, is this a potential action?
# Reactome annotations should be available from GO by the next GO Consortium meeting. Chris, Alex, Jen and Ruth to be responsible. # Add new evidence code EXP for 1:1 Reactome to literature, add all other Reactome with TAS to Reactome source.
# Convert Reactome complex terms to GO terms
# Jen to do a pilot project with a minimal set of terms, as an experiment and bring back results for next GO meeting
# (David Hill) Make difficult sensu terms organism specific (biologist intuitive) (i.e plant vacuole, fungal vacuole). However GO definitions will still be designed to be formal, not depending on species to define the term.


ACTION Jen do experiment
[[Category:GO Consortium Meetings‏‎]]

Latest revision as of 10:12, 15 April 2019

Please note that these meeting minutes are now being edited for production of a final version (02-11-2007).

Therefore please do not add any more information here but contact Val or Emily.




Sunday morning, September 23, 2007 (Day 2 Minutes)

Introductions chronologically:

2007 Dmitry Sitnikov MGI, Seth Carbon BBOP

2006 Jim Hu E. coli, Susan Tweedie FlyBase, Trudi Torto-Alalibo PAMGO, Donghui Li TAIR

2005 Ben Hitz SGD

2004 Doug Howe ZFIN, Ruth Lovering UCL

2003 Jen Deegan GO, Emily Dimmer GOA, Alexander Diehl MGI, Mary Dolan MGI, Karen Eilbeck SO, Petra Fey dictyBase, Ranjana Kishore Caltech, Pascale Gaudet dictyBase, Victoria Petri RGD, Kimberley Van Auken Caltech

2002 John Day-Richter BBOP, Eurie Hong SGD, Tanya Berardini TAIR, Amelia Ireland GO

2001 Rama Balakrishnan SGD, Michelle Gwinn Giglio TIGR, Harold Drabkin MGI

2000 Rolf Apweiler EBI, Val Wood Sanger, Rex Chisholm dDB, Chris Mungall BBOP

1999 Midori Harris GO, Kara Dolinski PU, David Hill MGI

1998 Suzi Lewis BBOP, Michael Ashburner, Mike Cherry SGD, Judith Blake MGI

Progress Reports

Next year the GO Consortium effort will celebrate its 10th birthday.

2007 Progress Report for NHGRI due Jan. 1, 2008

       These reports will review accomplishments to date. 
       We are using the itemized list of sub-aims from the grant to organize these 
Aim 1: We will maintain comprehensive, logically rigorous and biologically accurate ontologies.


Ontology development

Ontology Development 1 - Midori Harris

All content meeting related changes documented on Ontology Development Wiki

is_a complete was almost finished last meeting, but is now done and a system is in place to make sure it remains so.

Three high level terms need to be disjoint – cellular process, multicellular organism process and multi-organism process. General notes on is_a complete: Isa-complete_BP

Topics also overviewed; priority list on main Ontology Development wiki page.

Content meetings have been held for:

Other topics:

Michael Ashburner: Question IMG-to-GO and FIGS-to-GO mappings.

Jen, Midori: the IMG to GO mapping is mostly finished. These items are waiting for Jane to return.

Chris M: mappings between the BP and MF terms still need to be done.

JB/SL: wiki is a valuable resource, however it can get muddled sometimes – managers should keep track.

Alex: if you add new large section you should send out a general email.

ACTION ITEM – tutorial on wiki discipline (assigned to Jim Hu).

Rex – in addition, there could be a group of wiki experts formed, who people could contact for advice.

Ontology Development 2 - David Hill

1) Taxon and sensu.

“Sensu” confused users and curators, and editors became lazy in its implementation and accurate definitions were not created. Sensu terms have been renamed, merged or obsoleted (how many?) in collaboration with domain experts.

Note added after meeting: We would have to run obodiff to get counts for renamed vs. merged vs. obsolete, but we started in April with 229 'sensu' terms, and there are now 80 remaining in the live file. Of these, several are sorted out in the 'fruiting_body.obo' file in go/scratch/, and the remainder (about 60) are listed on the last sensu meeting notes page.

Definitions now need to state how a process occurs differently in the different organisms. If it is impossible to state this, then child terms will not be created. In future, term requests need to include reasons how a process occurs differently in different organisms.

Synonyms containing the sensu information are kept for these terms.

The general consensus at the meeting seemed to be that rather than create long convoluted term names, we would still be allowed to create a term such as plant-type vasculature as long as the definition clearly differentiated the terms.

Function-Process Links

Chris M: these mappings are complex

Waiting for OBO-Edit 2.0 for help on cross-products.

2) Regulation.

Regulation Main Page

GO will soon add a new relationship, 'regulates'. Regulation-of-process terms will then be changed from part_of the process to regulates (for example, 'regulation of metabolism part_of metabolism' will become 'regulation of metabolism regulates metabolism').

During the is_a-complete work, three top-level regulation terms were added to represent three categories of biological regulation: regulation of molecular function, regulation of biological process, regulation of biological quality.

Chris has generated a report (go/scratch/regulation-of-non-process.txt) of all descendants of 'regulation of biological process' where there is no term for the process being regulated. David Hill is going through the report (not as bad a task as he'd feared), and has found that the violations fall into three categories, corresponding to the three parts of the Regulation Worksheet:

Part 1: The regulation term describes regulation of a molecular function or a biological quality, so the term is o.k.
Part 2: The regulation term is a legitimate subtype of its parent, but a more specific process term isn't required. Example:
GO has 'regulation of transcription involved in forebrain patterning' and 'regulation of transcription', but not 'transcription involved in forebrain patterning'. 'Regulation of transcription involved in forebrain patterning' is
  • Part_of forebrain patterning (check)
  • Is_a regulation of transcription (check)
  • 'Transcription of forebrain patterning' is not necessary -- it is essentially the same process as transcription

This term will inherit the regulates relationship through its is_a parent and will regulate 'transcription'. It will remain a part_of forebrain patterning since every instance of this process is a part_of an instance of forebrain patterning.

Part 1: Actual problems of various kinds; David has made suggestions about how to handle them, which everyone should check -- especially the ones with question marks.


Chris Mungall: there are problems with cross-products, and would be easier if the parent terms did exist.

David H: this will be resolved once the parent terms do exist.

David H: concern about consistency in regulates relationships. In some cases, negative and positive regulation of a process are part_of the parent and in some cases they are is_a of the parent. We need to be consistent about this. For now, negative and positive regulation of a biological quality are a special case. When you are regulating a biological quality, the regulation is a balance of the positive and negative processes. Therefore, the positive and negative children are part_of the regulation of the quality. Suggestion to use homeostasis terms for overall regulation of biological quality [midori]

CM: will look at relationships between cell types and GO terms: use as a guide to populate GO with missing terms.


Q. VW: How existing annotations are affected by relationships change Eg transcription intiation. may have annotated more granularly to regulation of transcription initiation when there is direct involvement. Topic for annotation discussion at some point?

A (David, Midori): The 'regulates' relationship shouldn't affect annotations. Basically the part_of relationships already exist and we will simply replace it with regulates. We are already annotating to regulates terms and it shouldn't change. What will be different is how we process annotations. We will be able to decide whether or not we should include regulates children.

ACTION ITEM (ALL) Look at and comment on outstanding items (search on ?)

ACTION ITEM Check whether there should be a relationship between pigment metabolic process and pigmentation

3) Information content analysis. Collaboration with MIT/Harvard group.

MIT and Harvard got in contact with GO, were interested in measuring information content of a GO term. They looked at the number of annotations to a term related to its position in the ontology.

They developed a statistical algorithm to determine information content based on the assumption that if not many genes are annotated to a term it has a high information content and a term with lots of gene products annotated has a low information content

David, Midori and Jane then looked for outliers with respect to information content (finding terms that were either too specific, at a higher or lower level than they should be)

Took higher level terms which had too few annotations compared to other things the same distance from the root and looked if they could be relocated. e.g pilus retraction was a direct child of 'cell physiological process' and was relocated to pilus organisation and biogenesis, so that it was at an appropriate level in the GO hierarchy.

Similarly lots of specific terms had a larger than expected number of annotation eg. Olfactory receptor activity

Some of the annotation distributions between terms also just reflected biological differences e.g. cation and anion transport terms: there are more cation transporter genes than anion transporters, the two terms are at the same level in the ontology - as they should be. Therefore this analysis can only draw attention to particular parts of the ontology which a curator then can examine.

Q: JDR Is it possible to put this analysis into GOC tools?

A: CM – the analysis is already in database, can be used.

AD: this is something which can be repeated semi-regularly, but not to dwell on too much.

DH: this has beeen a very good collaboration experience, and had produced good contacts to continue relationships with. SR: We know of other groups that could also get in touch which are interested in this area as well - will get in touch with David Hill. JB: annotations give power to these kinds of approaches. And until we have an annotation core we are restricting this kind of potential activity.

Ontology Development 3 - Chris Mungall

Wiki for ontology structure (should be merged with Ontology Development)

http://gocwiki.geneontology.org/index.php/Ontology_Structure


1. Mining Reactome links to link process to function – more after lunch.


2. Internal cross products can start to be created and maintained in the ontology. OBO-Edit 2.0 will make it easier to maintain these cross products.

New cross product guide on wiki. Links to ongoing work on BP – CP cross products;

e.g. could link histone deacetylase complex to histone deacetylase activity (this type of linking is easier than creating BF to MF links)


http://gocwiki.geneontology.org/index.php/Cell_cross-products

Includes:

Internal links (existing)

External links (function to process links)

External links (x products)

Links need to be treated with caution. Links are kept in a file separate to GO at the moment, as people could make erroneous propagation of annotaitons between the Gene Ontologies (i.e. just because someone annotates to a certain process, it does not mean they should necessarily annotate to the linked function).

3. contributes_to

- people are using this qualifier incorrectly in annoations. VW: take Histone Deacetylase complex as an example, this is a very large complex with many molecular functions. Therefore one complex can be linked to many different functions. We should use contributes_to *only* in those instances where the annotator does not know which subunit provides a function. JB: no, contributes_to can be used also when you *do* know the individual contributions of subunits. MD: often subunits which do not have a specific activity themselves are involved in enabling another subunit providing the activity. VW: but this does not hold ofr all complexes, we are using this qualifier in too many different ways. DH: often, if a subunit is knocked-out, the observer cannot tell if the subunit has a direct or indirect influence on the resulting phenotype. Therefore in addition often the 'contributes_to' qualifier is missing.

discussion postponed


Internal cross-products If cross-products were maintained in the GO directly, it would make life easier. Cross-products will be more manageable in OBO-Edit 2.0 where there are many features - can use a 'Cross-Product Matrix Editor' - can see the possible cross-product/GO combinations, parents and children of a term. - this helps identify missing links in the DAG. - in addition, there will be an ontology repair option, which can introduce these links, e.g. missing is_a links. DH: we want to use this to go through the logic of the regulates relationship - as concern about ensuring consistency. CM: will also look at relationships bettween cell types and GO terms: and can use as a guide to populate GO with such missing terms. ... more on cross-product logistics later.

Karen Eilbeck SO Progress

Development : March–>August joined J Thornton group - Gabi Reeves for BioSapiens project on protein terms, 96 new terms for polypeptides have now been added to SO.

Mark Hathon (with Barry Smith)BioSmith, Buffalo – ongoing work on regulatory regions.

Content meeting in June, HLA immunology community – looking for terms to describe variants. Added new terms, rearranging of SO – very productive. Assigned work to Alex, nothing to report.

Collaboration with Arian at phyGo. Mobile genetic elements for viruses. This is in parallel with work happening in GO.

Working on synonyms with Colin Batchelor, and over 400 new synonyms have been added to SO.

Release SO now every 2 months. Therefore there is a stable and leading-edge version for those interested.

Changing requirements for GFF3 - this not done yet.

Karen dropping down to 60% on this project.


COFFEE BREAK


Aim2: We will comprehensively annotate reference genomes in as complete detail as possible.


Reference Genome Annotation Project - Rex Chisholm

Aim3: We will support annotation across all organisms.


File:ReferenceGenomes GOC PU 2007final.ppt

Purpose: to provide comprehensive, robust collection of annotations for 12 genomes. These genomes have the most published data, have a genome database and experienced GO annotators. These high-quality annotations will be a resource for other groups to transfer to genes in their species.

Complete/comprehensive annotation includes measures of breadth and depth.

Breadth – every gene annotated.

Depth – gene annotated to the highest possible knowledge. If there are only a small amount of papers (5-10) then the curator should read all. If extensive then the curator should be selective, completion best assessed by a curator)

Target Gene Identification (Priority genes)

250 genes have now been targeted for curation. The target method has now been changed, targets are now (as of last month) selected based on disease type. Gene when mutated should contribute centrally to a disease phenotype(OMIM). This method has been generally successful, however there is now a challenge for mammalian groups with the increased literature load. Also a challenge for non-mammals - orthologs may not always be available (e.g. neurological genes in yeast). These challenges need to be balanced.

Ortholog Identification

Need to have a good set of orthologs.

Need to find ways of facilitating this work through tools, no obvious choice as yet. e.g. InParanoid have problems in keeping pace and providing up-to-date sets. Would be good to have a ortholog set automatically provided which curators could then validate.

Software

Currently use Google spreadsheets for target lists and information on curation progress. However this is not robust enough and time consuming. A database will be developed to handle this data and requirements have been written up. This will mean that the Ref Genome data is more structured. The database will provide a consistent use of identifiers, MOD association file loading, tracking when no ortholog found, and an automated response if a paper appears after a 'comprehensive date'. Sohel Merchant (left in July) wrote prototype << ADD URL>>. A new member of staff will start at the end of September to continue development.


Metrics

Annotation Progress – see slide.

Annotation Consistency.


Mary Dolan's tool for comparing annotations by looking at generic GOSlim branches useful as different organisms are used in different experimental approaches and different levels of data are available in different organisms. Eurie: if there is an outlier in annotation consistency checks this might also indicate organism-specific data (e.g. chromatin silencing not appropriate term for yeast).

Table View (slim showing each terms annotated for a gene) includes every term useful for curation and annotation consistency (add link???).

Ontology Development Aim to have robust discussions on annotation and ontoloo9gy development issues. Number of sourceforge requests from reference genome group in the hundreds over 16 months. There is an average of 10-12 SourceForge requests per month. GO editorial group doing a good job at keeping up with these. Existing requests are problematic. 411 terms. - It is important that curators label their SourceForge request as relating to a Reference Genome group.

MH: Can retrieve number of GO terms that have resulted from these requests by looking at the cross-references file: 411 terms from Reference Genome-marked reqests.

Ruth Lovering's Metrics Document v3: File:HowToCaptureMetrics3.doc


Publicising - need to start publicizing Reference Genome work.


Annotation Outreach – Jen Deegan

Aim: to find new groups to join the GOC annotation effort, and keeping track of new groups annotating and writing documentation to help get groups started.

see wiki: http://gocwiki.geneontology.org/index.php/Annotation_Outreach_group_reports

Media:outreach_princeton.ppt

Jen described the scope and techniques of outreach effort. Showed an 'ontology ' of outreach effort. There has been much progress on grants.

Attending many regular conferences.

Less cold calling, it wasn’t very successful. More luck tracking down the right person at conferences. Responding to invitations.


People going to meeting – report back information from willing people to Jen.

The SOPs have been tricky but are now on the public GOC website:

http://www.geneontology.org/GO.annotation.SOP.shtml MA: this page is difficult to find. Action: this page needs to be reviewed and included in the next newsletter

ACTION ITEM Jen: A reference to these pages should go in next newsletter.

ACTION ITEM Jen Add a link from outreach to the SOPs)

There has been funding success - for the British Heart Foundation and AgBase grants.

MA: for new groups annotating, how many SourceForge requests are we getting? e.g. Aspergillus group should have requested new terms?
SL: agree. As soon as an annotation effort really has started, the group often needs a number of new terms.

Jen: for emerging genomes the main problem is finding funding to support an annotation effort.
MC: need to determine if they are only doing IEA annotation, or whether they have the time to carry out manual curation.
JH: the process of making new term requests is not obvious
JW: the SourceForge term tracker only goes to the GO list, so other groups not aware
MH: it is possible to add more e-mail addresses to this list.
MA: not our job to source funding for new groups, it is the job of the individual groups.
JB: supporting new groups is important, need to mentor groups and support them submitting new terms.

ACTION ITEM investigate why terms requests aren’t coming in, do we need to do things to make it easier, SF tracker list/annotation list/ who are on these lists/ do other people need to be on those lists?


User Advocacy - Eurie

Focusing on lines of communication, web presence, newsletter and mailing list.

Different users, new users, current users, power users

Most of the past year has focused on the lines of communication.

wiki User Advocacy main page: http://gocwiki.geneontology.org/index.php/User_Advocacy

Rota of mailing list monitors.

Newsletters archived. Future news items page on wiki. Wiki or Newsletter ideas <<add link>>

Michael A wants ISSN for the newsletter.

ACTION ITEM NML Michael sent URL to Eurie – Action Eurie!

Somebody mentioned RSS feed, is this a potential action?

Users meetings, we have a page of potential meetings on wiki. - used to target groups new to GO and help education. - have a workshop specific for microarray users (rather than an add-on to MGED)

Tools standards. (Needs to be cleaned up and categorised) - ideas for minimum standards for GO tools - send out list a month ago: http://gocwiki.geneontology.org/index.php/Tools_standards

Production Systems - Ben Hitz

File:ProductionReport GOC PU 2007.ppt

Deployed 4 new linux machines 1 for loading, 2 for AmiGO production, 1 AmiGO development.

Production AmiGO now more fault resistant. ACTION: e-mail Ben if you are not getting a gp2protein check for your database.

Go Database loading speeded up and now in testing.

Godb sequences – using gp2protein files. If possible do all sequences in your DB, not just annotated.

Assocdb fasta file – Header line massive – can be slimmed down?

Association file cleaning – All IEAs must have a with field.

AmiGO – Amelia

AmiGO enhancements and new search features demo

  • Search result relevance implemented - most 'relevant' results are shown first
  • Term and gene product search is now "intelligent" and AmiGO will automatically search all fields if it doesn't find a match.
  • Term enrichment (also known as "GO Term Finder") and GO Slimmer (map2slim) functionality have been added to AmiGO. Both can use uploaded user files or data from the GO database.
  • Downloads in OBO, RDF-XML and gene association format now possible

LUNCH BREAK

Action Items Review

This large section moved to it's own page:

Outstanding Action Items from 17th GOC Meeting, Cambridge UK

Afternoon, Sunday, Sept 23, 2007

Reactome - Peter D’Eustachio

File:Reactome to GO GOC PU 2007.ppt

Reactome can provide data to proteins that UniProt does not yet have manual annotations for most of this Reactome data is derived from experimental evidence identified from papers however unlike the GO annotation method, the types of experiments have not been recorded.

Emily: GOA would love this data, but unless have a new parent ‘Experimental’ code, the best that exists is ‘TAS’.

Suzi Lewis: there is a use for a hierarchy of evidence codes. With an ‘E’ Experimental code as a parent of the IMP, IGI, IDA, IPI, IEP granular codes.

Peter: Homolog sets used to transfer data between species is determined by individual experts, and transfer between orthologs AND homologs (where functionally similar)

Judy and Suzi: Reactome data is valuable. It is unacceptable to not be including it in GO and it is unacceptable that this data should have anything less than an experimental evidence code. TAS or NAS evidenced data are unacceptable also.

Peter: current Reactome curation methods is to avoid unpublished data and Reactome curators also want to be opinionated about the published data, to the end that Reactome will reflect current expert opinion, and avoiding hypothetical theories. Only confirmed, accepted knowledge is included. There are 10 curators, only 2 of whom have previous experience in GO annotation, there is no budget to do GO annotation and no desire to teach curators about GO evidence codes. Don’t always know which piece of literature applies to which info. 2000 gene annotated. 4000 pieces of literature. It is not clear how many GO annotations this would convert to.

Suzi Lewis: This brings up the question of what is the purpose of evidence codes? Why do we have the ones we have? Do users use them? (something to discuss tomorrow).

Pascale: have evidence from users that they do care whether IDA or IMP codes are used.

Peter: There is not always a 1 GO term to 1 publication relationship. Sometimes a GO term may have originated from the combined curation of many papers.

Eurie and John Day-Richter: TAS annotations are valuable, and may be good to get the data in.

Suzi, Judy: this data is too good for TAS.

Emily D: Why not use a mix of codes depending on the GO term to publication ratio? For those instances where there is a 1:1 relationship of GO term to publication: use ‘E’, for 1 GO term to many publications: use ‘TAS’ and cite the Reactome reaction web page as the source – this then acts as the reviewed document.

David Hill: concerned about the proposition of a new ‘Experimental’ evidence code: might loose analytic power.

Judy B: could Reactome curators go back and re-annotate those 4,000 papers and convert the codes to one of the GO experimental codes? This would only take 2 weeks to do.

Peter: Not possible – Reactome have defined goals, we cannot afford to reannotate for GO. 75 genes/month is the absolute minimum annotations. We have our own grant objectives we must fulfill.

David Hill: GO curators could prioritize the reannotation of genes for which there is not much annotation available.

Rex: could the reference genome groups each take on a subset of annotations and re-annotate?

Emily: then the annotation would belong to the group that reannotated. We would be using Reactome data as a source, but the final annotations would be attributed to the group that provided the final annotations. Might not be the best use of resources.

Suzi : Would accept ‘EXP’ for the 1:1 mapping of GO term to publication.


Q Val: Any idea how many aren’t covered by GO annotation already? A. No…


Judy, Sue R, Emily D, Tanya B: the ‘EXP’ code would make life easier for users, for other integrations as well

ACTION: Reactome annotations should be available from GO by the next GO Consortium meeting. Chris, Alex, Jen and Ruth to be responsible. Add new evidence code EXP for 1:1 Reactome to literature, add all other Reactome with TAS to Reactome source.

Arguments for structuring evidence codes i) make things simpler ii) allow incorporation of other date iii) needn't change our current usage iv) do the TAS for the things that don’t fall under EE that can’t be assigned to a single paper.

(continue tomorrow the discussions of the point of evidence codes and the possibility of new parent ‘EXP’ code)


Protein Complexes: GO vs/ Reactome

Reactome complexes are seen as an entity, (i.e. a collection ofo proteins) whereas GO treats complexes as a subcellular location However there is also a blurring between the two for Reactome, especially when looking at large complexes. Peter: In our annotations, a cross-reference slot allows us to cite a GO identifier for the location (usually to the parent term of the complex). Reactome curators add the cc term that is most granular, and willing to generate SourceForge request for those missing

Judy B: talked to Lisa in Bar Harbor on complexes for Reactome. Concern about the active function tag to the active polypeptide.

Peter: for a catalysis – any physical entity in a complex is given a GO term describing the activity, however the active unit, which mediates the reaction is labeled by Reactome. Can parse out which of the polypeptides had the catalysis functions and which are just associated – in most cases this is identified by experimental data. Although Reactome does not always search for the most granular Biological Process GO term, these haven’t been applied consistently.

David Hill: there should be no problem mapping this data from Reactome, while the concepts in GO and Reactome are not equivalents this is not a problem as GO would annotate the same gene products as Reactome would.

Peter: Ewan did have a concern about the ‘contributes_to’ qualifier – concerned that a significant number of end users would not always be aware of use contributes_to. But really this is the users problem. And they can strip out if necessary.

Jennifer: users have suggested that GO could strip out annotations which use the contributes_to column (especially the NOT annotations) and these then could be provided as a separate file. As these can be dangerous to ignore.


ACTION ITEM convert Reactome complex terms to GO terms

‘Taxon and GO’ - Jen Deegan

File:Taxon and GO GOC PU 2007.ppt (using paper from Waclaw Kusnierczyk)

Originally Chris and Jen worked to loose sensu tags and redefining definitions and adding taxon links - However removal of taxon has been a problem. There are now 23,802 terms. Searching for terms is a time sink for users, - GO help has often received queries from users asking if there is a taxon-specific GO slim/subset of terms (e.g. plant-specific GO)

- In addition, Jen as outreach officer has found new MOD groups are unwilling to annotate to GO unless there is a slim available for them.

- GO language can be subtle. GO term names can now be complex now the sensu information has been removed. This would make GO terms easier to find and decipher.

- In addition, having taxon information in the GO helps error checking

- There are 3 types of relationships that could be applied to relate taxon to GO terms: 1. Is_relevant_to ` 2. is_only_in 2. applies_to_all

This taxon-specific information would be added into a separate file.

Discussion:

Judy: Against including taxon information within the GO as we do not know all properties of a taxon. Taxonomic information is in flux also, we do not want a dependence on taxonomy in GO. We would be restricting ourselves if we did not make all terms available to all users. Could not instead users look at the terms that were used by a reference genome group to see what terms are appropriate for a particular taxon?

- general disagreement from curators of this possibility.

Agreement that there are incorrect annotations which relate to taxon-specific properties: Harold: in the Fantom load – needed to remove incorrect mouse annotations

Val, Harold: InterPro2GO throw out problems. These could be identified by this method.

Val: I perform monthly checks to ensure no inappropriate terms have come in at high level. This is time consuming, and this would help.

Pascale: would help sanity check annotation data

Val: this species information doesn’t need to be comprehensive to be useful for annotation checks

Eurie: if this would help annotators, this information could be built into an annotation tool?

Ruth: there are interesting concepts here, but does it need to be so complicated, would all taxons need to be included. Could we not instead just use just 10 high-level taxon identifiers.

Judy: Instead, could not rulebase triggers be used Efforts should be on annotation of literature rather than waste a considerable amount of time incorporating taxon information. We do not want to commit such a level resources to such a project especially as budgets are stretched presently. Again, concern about fluidity of taxon-specific information

Sue Rhee: we should explore usage of GO slims.

Suzi Lewis: there are risks in this kind of project, and concerned that this project would entail quite a bit of work and could also be misunderstanding by users. Can we have a low-key evaluation.

JDR: a large-scale activity of this – is a bad idea. You would propagate garbage by accepting all annotations. Could use as just a framework by only using 10 top taxon id. – this would already help find problems. JC – agreed.

Alex D: Isn’t this just a user education problem? Users need to take the time to understand the GO hierarchy, that you can search synonyms, definitions etc. Feel that user queries are symptoms of users not trying hard enough to work with GO.

Mike Cherry: could not afford to make this a big project, there are other developments in GO which need to be addressed

Rex: Had concern about making taxon-specific assertions that are flawed. If these types of sanity checks or limits were automatically applied, we would loose the potential value of not looking into these, however this data would probably tell us something fundamental about biology, and loose the ability to investigate these.

Judy: classifications of taxon are based on phenotypes and not molecular data and many things are being found and taxons are being redefined. Prefers’ is_relevant_to’ Like the idea of flags/triggers to factiliate work, but wouldn’t automatically exlude, as this data is important.

Michael A: while some taxonomy is changing e.g. in protista, it is unlikely that viridplantea or mammalian will move around so much.

Ben Hitz: what fraction of problems would be solved if there were cross-products to taxonomy were included?

Jen Deegan: it would solve some, it would help with the development terms.

Ben Hitz: what would the time line be for taxon cross-products?

Chris Mungall: this is much further down the line.

Judy. Our main issue here is how to facilitate annotations in our groups. However but we are hung up on a suggestion from outside the group.

Chris Mungall: slims are much harder to maintain than these relationships would be.

Michelle M: When the prokaryotic subset was created, she was v much against. Instead of users looking at 20,000 terms, they are now looking at 9,000 – there is not that much benefit. Don’t think new users need this, need to facilitate better ways of finding terms within the tool. For curators it might be useful for error checking, but not new users.

JDR: although there is a big concern that you’d loose annotations because of these relationships, this would not be the case as the incorrect annotations would instead be brought to your attention – and visible to better investigate/ or improve GO. the rules could be fixed.


Ruth: how would this data be viewed ? In addition, if a user does not understand a term then it really is a problem with the terms definition – instead the definition needs to be improved, this would be far more valuable than adding in an additional cross-link.

Jen: will be willing to carry out a small pilot version of this task in her own time. Would add 10 is_only _in relationships and use these and the annotations to check for errors in the annotations and the ontology structure.

ACTION: Jen to do a pilot project with a minimal set of terms, as an experiment and bring back results for next GO meeting

ACTION: (David Hill) Make difficult sensu terms organism specific (biologist intuitive) (i.e plant vacuole, fungal vacuole). However GO definitions will still be designed to be formal, not depending on species to define the term.

Summary of action Items from Day 1

  1. Tutorial on wiki discipline (assigned to Jim Hu ?).
  2. (ALL) Look at and comment on outstanding items Outstanding Action Items from 17th GOC Meeting, Cambridge UK
  3. Check whether there should be a relationship between pigment metabolic process and pigmentation
  4. Jen: A reference to these pages should go in next newsletter.
  5. Jen Add a link from outreach to something (SOP?)
  6. investigate why terms requests aren’t coming in, do we need things we need to do to make it easier, SF tracker list/annotation list/ who are on these lists/ do other people need to be on those lists?
  7. NML Michael sent ISSN URL to Eurie – Action Eurie!
  8. e-mail Ben if you are not getting a gp2protein check for your database.
  9. Somebody mentioned RSS feed, is this a potential action?
  10. Reactome annotations should be available from GO by the next GO Consortium meeting. Chris, Alex, Jen and Ruth to be responsible. # Add new evidence code EXP for 1:1 Reactome to literature, add all other Reactome with TAS to Reactome source.
  11. Convert Reactome complex terms to GO terms
  12. Jen to do a pilot project with a minimal set of terms, as an experiment and bring back results for next GO meeting
  13. (David Hill) Make difficult sensu terms organism specific (biologist intuitive) (i.e plant vacuole, fungal vacuole). However GO definitions will still be designed to be formal, not depending on species to define the term.