GO Consortium Meeting 2007: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
 
mNo edit summary
 
(90 intermediate revisions by 17 users not shown)
Line 1: Line 1:
==Meeting Room==
[[Category:GO Consortium Meetings]]
== NO MORE EDITS PLEASE ==


The GO Advisors Meeting  will be held in Conference Room 316L of the South Campus Center of the University of Washington. There will be transportation from the Silver Cloud Inn to the Conference Center starting at 7 am.  The meeting will start at 8:00 am.
== Downloads ==
*[http://www.geneontology.org/meeting/minutes/20070108_Cambridge_Agenda.doc Agenda for the GOC meeting]
*[http://www.geneontology.org/meeting/minutes/20070108_Additional-Material/ Additional material for the meeting]
*[http://www.geneontology.org/meeting/minutes/20070108_Progress-Reports/ Progress reports from consortium members and working groups]


== Aim of the meeting  ==  
==Topics==
The main focus of this meeting is to obtain our advisors views and perspectives on the work we have planned (i.e. the new proposal) and the approach we are taking to accomplish our aims.
=== GO Team and other Status Reports ===


'''''Topics'''''
Monday, December 8th
# 8:30 Reference Genomes - Rex&Karen
#*Summaries of the genomes - the display metrics
# 9:00 Ontology Content - David&Midori
#* IS_A complete
#* regulates (note: Chris has agreed to a joint ont dev/software perspective on this - mah)
#* Cell Ontology links
#* Collaboration with Jonathan Liu and MIT
# 9:30 Ontology & Software - Chris&Ben/Mike
#* Includes OBO-Edit working group report
break
# 10:30 Annotation outreach - Jen&Michelle
# 11:00 User Advocacy - Eurie&Jane
#* Includes AmiGO working group report
#* Includes 'Hub' report?
# 11:30 Operations Summary - Suzi
# Publications/Presentations/Tutorials/Posters (handout)
=== Issues to be addressed, ordered with harder topics first ===
Monday, January 8th after lunch & Tuesday, January 9th morning<br>
==== Discussions ====
#GO policy on incorporating GOA annotations into MOD annotations (Evelyn and Mike/Judy?)
#*GO annotations have been stripped out of GOA-UniProt (all species file) and all other gene association files by using taxid stated within the file. This is defined within the GOC documentation at: http://www.geneontology.org/GO.annotation.shtml#script The plan was for each member group to integrate annotations that were being filtered out. To date this is only happening at MGI.  The result is that annotations from GOA for experimentally results are being lost.  GOA receivies a lot of user questions about how to get complete annotation datasets. The unstripped GOA-UniProt file is available on EBI and GOC FTP sites (however in the later is not clearly stated in our documentation).
#*GOA now integrates all experimental data from each GOC member on a monthly basis.
#*We have a GO policy on this. Perhaps, if a GOC member cannot integrate the manual annotation from GOA and others that taxid should not be filtered from the other gene association files?
#*In practice, the MOD groups identified need to be contacted to find out how they are doing in incorporating appropriate annotations in their files.
#*GOA PDB gene association file - should this file be stripped on GOC site by taxon id? File created by GOA and InterPro3D, special pipeline used as PDB entries do not map 1:1 to UniProt/other identifiers (Dan).
#Prioritize list for next ontology development meetings.  Do we need to do these in sequence or parallel. Many of the same ontology developers are always involved {David, Midori, Jen, Jane}.  However, there are cases where others are involved.  Some prioritization may come for GO-engineering collarboration with MIT.  At present, the sorted list is as follows.
##is_a complete (hopefully done by GOC meeting)
##some component of development and physiology of cardiovascular system (May)
##muscle development {suggest by Erika and colleagues}
##peripheral nervouse system {continuation of early CNS work}
##DNA repair? (perhaps Eurie could organize this?)
##Transport (suggested by Val)
##How do we give credit to external contributors to GO (Midori)
#Piped data for IPI, need consistency in usage (Evelyn)
#*IGI data allows piped accessions in the 'with' columns to capture the fact that two or more genes may be interacting simultaneously. IPI data also allows piped accessions in with column but some GOC members here use the pipe to specifically say that in a given paper that protein A, B and C precipitated together or form part of a complex others I think use it also for circumstances where 2 separate experiments in the same paper showed protein A interacted with protein B and to protein C. GOA prefers using it like IGI for a specific circumstance otherwise information is lost? Others??
#*Related Issue: GOA has decided for the moment not to pipe several protein binding interactions simply because it comes from same paper. We unwrap piped data from MODs because of inconsistency in usage and because this data not normalised (causes problems of database and web services)
#*Karen C adds: I think the same issues apply to IGI, so whatever we do should apply to the with column when used for either IPI or IGI, or perhaps for any use of the with column.
#Discussion of 'anatomical processes' such as 'heart pumping' in the process ontology. Should we add terms like this, how are we going to do it? If we are not, can we express these anatomical processes in another way?
## Add these terms and then make non-anatomical processes part_of them. This will create a lot of true path violations if different anatomical structures in different organisms carry out the same process. We would also have to make specific children.
## Create a method for 'annotating' anatomical structures from other ontologies with GO biological processes.
#Overlap/connections between GO and SO?
#:Emily Dimmer submitted a SF item asking if GO would want to have terms in the component ontology to represent situations such as the finding that human myosin 6 coimmunoprecipitates with RNA pol II at the promoters and/or intragenic regions of active genes. After an email discussion between Karen E and Karen C, the question boils down to whether/how to make such a connection between SO and GO.
#*On the one hand, it seems redundant to repeat the terms in both places. In general we are trying to avoid overlap between the ontologies.
#*On the other hand, it seems that SO is used for the annotation of the sequences with respect to what they are, while GO is used for the annotation of gene products with respect to where they are located for component terms. Thus, I don't want to start mixing my annotations of gene products with SO terms as well as GO terms. If we want to be able to annotate these types of sequence locations as places where gene products can be localized, I'd rather do it in a way where there is a term in GO that has some relationship to a term in SO.
#:The consensus is that we should discuss this issue at the GOC meeting. The SF item is here: https://sourceforge.net/tracker/index.php?func=detail&aid=1587313&group_id=36855&atid=440764
#:This may also help with a question from Michelle about the provirus and viral genome terms: https://sourceforge.net/tracker/index.php?func=detail&aid=1571666&group_id=36855&atid=440764
#Do we want all groups to be able to provide structured notes, or do we want to proliferate GO terms for things like cell types? See https://sourceforge.net/tracker/index.php?func=detail&aid=1598448&group_id=36855&atid=440764 and https://sourceforge.net/tracker/index.php?func=detail&aid=1587269&group_id=36855&atid=440764
#Change in interpretation of the database identifier in DB column of association files (Emily). Change suggested so that the combination of the DB (column 1) and DB_Object_ID (column 2) fields provide a globally unique and resolvable identifier, rather than naming database submitting file (as currently defined). The ASSIGNED_BY column will still state from where the annotation originated.


Please add items below that you think need to be presented. When possible, please put your request in priority rank order:
==== Things that have been agreed, just need to do ====
# All MODs should provide a file with all protein sequences.  Also the known UniProt or NCBI accessions should be included in the gp2protein file.
#*Each MOD has the goal of annotating all the gene products within their genome of interest.  Thus each MOD has a dataset of proteins, even those that have not yet been annotated.  This dataset should be provided from the MOD site, and from the GOC site.  The dataset should include the UniProt or RefSeq accession if known.
#*The gp2protein file should include all the accession numbers even the accessions for proteins that have not yet been annotated.
#*The International sequence databases have an ownership system in place that limits who can make changes to the sequence or its annotations.  Sometimes the MOD has newer information that is available from GenBank/EMBL/DDBJ because the authors are slow depositing updates. (Mike)
# Make our choice for on-line meeting support software (John D-R)


'''''Utility for Biologists'''''
==== Put in Reports Session ====
#Do we want have time to update other GOC members on GO related grants that have been submitted or to be submitted or do we leave this info for project reports?(Evelyn)
#Perhaps part of Outreach Grp, would like to discuss experiences of GOc member with getting feedback from community on annotations, what works best, wiki, face2face chats, e-mail, online forms etc..(Evelyn)
#I would like an update on complex GO annotations (nomenclature, when and when not to request a term), GO collaborations with IntAct and CheBI etc...(Evelyn)
==== Need proposals for the new evidence code definitions ====
#Resolution of several Evidence Code issues from Annotation Camp (Karen & Evidence Code documentation committee)
#*What evidence code to use for profile HMM based annotations.(Michelle)
#:At the annotation camp a proposal was raised to use RCA for profile HMMs while Michelle has argued that these should remain ISS.  There is agreement that  the models used for things like TMHMM and SignalP might better belong as RCA.  However, there is disagreement about the the HMMs in the TIGRFAM and Pfam sets.  The proposal says RCA, others argue it should be ISS.
#*(Note added by Val.) The original proposal was that ISS should only be used when transferring annotations to orthologs. This isn't always practical (or possible), as for some domains (i.e. F-box), we know they all act as as substrate specific adaptors for ubiquitin ligases, but  we cannot unambiguously assign them to a characterised ortholog. However, the protein is clearly a family member (judged by assessing the  alignment -ISS), has been named as an F-box by the laboratories studying these proteins (but are currently unpublished). I could leave this as IEA, but I wan't to show that this has been manually assessed. This is the only way we can weed out false positives from the electronic mappings (I have reported ~260 so far see https://sourceforge.net/tracker/?group_id=36855&atid=605890) Also using our protocols manual assignment overrides other possibly less granular redundant IEAs.
#:The same would apply to many zf-fungal Zn(2)-Cys(6) binuclear cluster domain. All proteins with this domain are transcription factors, and based on the fact that they are members of this family (based on the multiple alignment-ISS). Sometimes the orthologs cannot be unambiguously identified (because of multiple deletions and duplications), for others the S. cerevisiae orthologs are not studied or annotated. However every single one characterised so far is a transcription factor. I don't see a problem with annotations ISS to the Pfam alignment for the functions which apply to ALL family members. In fact, with an  ISS to a multiple alignment (as previously pointed out by Michelle) you can have greater confidence than an ISS to only a pairwise alignment. I see far more problems with ISS annotations which are not supported by anything in the 'with' column (too many to even provide feedback on).  Converting IEA to ISS involves many things (selecting the correct degree of granularity, checking the alignment, checking that all proteins with the domain studied so far have this function, community feedback). But essentially these are ISS, not RCA.
#*(Karen C adds) At the recent Annotation Camp, we also agreed to use RCA for things like tRNA scan and the snoRNAs, but the more I think about it, I really think this is purely sequence based and thus should be given ISS, not RCA. We would also need to resolve what, if anything, could appropriately be put in the with column.
#*Flip side of the issue: What should RCA cover?
#:At the Annotation Camp, we proposed to use RCA for a number of purely ISS-based methods where it was difficult/impossible to fill in the with column. Firstly, Michelle Gwinn has objected to disallowing use of ISS for purely sequence based methods. Secondly, RCA was initially proposed for computational methods that combined multiple data types and then performed some analysis that could be used to make predictions for GO terms. At the St. Croix GOC meeting, it was mentioned that the docs currently state that RCA should be for non-sequence based, but that it should probably be expanded to allow inclusion of sequence based data, provided that the computational method was not purely sequence based.
#*Boundary between ISS/RCA/IEA
#:Once the above issues on what ISS and RCA should cover, we may also want to make sure we are clear on what is the policy for promoting an IEA to the appropriate curator reviewed code. The Annotation Camp minutes note that "There seems to be a lack of clarity on the proposed new boundaries between ISS, RCA, and IEA, particularly RCA and IEA. Even just the above two paragraphs leave me confused as to where one would use IEA versus RCA for an HMM-based method. The group as a whole may need to discuss this further." I'll also add that while the original boundary between IEA and ISS made a statement about curatorial review of that particular annotation, the guidelines for use or RCA stated only that the method have been reviewed and validated, not that each individual annotation be validated by a curator.
#*Clarification of TAS and NAS
##TAS - At the Annotation Camp, we agreed to limit use of TAS to situations where you can say "Paper A that I was annotating referred to paper B as the source of this statement". This would exclude the historical usage of TAS for common knowledge statements. Basically, this code would only be for cases where you can go the paper cited for the annotation and trace the statement to a cited reference. To use TAS, there is no requirement to go to the cited paper and confirm that it contains experimental characterization of the species of interest, because that defeats the purpose of the TAS code. However, recognizing that authors are not always precise with respect to species when citing references, Reference Genomes have agreed to avoid use of this code whenever possible. We should probably add documentation about this issue with the recommendation that tracking down the cited reference and annotating from it is recommended when possible.
##NAS - At the Annotation Camp, we agreed that NAS should be used in all cases where the author makes a statement that a curator wants to capture but cannot be traced to a specific publication and this should apply to both peer reviewed papers and information from textbooks.
#:NAS and proposed use of with column - An example of when to use NAS and what to put in the WITH column was provided by David H at the 2006 annotation camp as follows: "If I draw the conclusion that a transcription factor is in the nucleus then it is IC; if the author draws that conclusion then it is NAS. The WITH field would contain the GOID for 'transcription factor activity' in each of these cases.  Note that this is an expansion of the use of the WITH field for the NAS evidence code."
#*IEP - may be some need to clarify usage of this code (note that this comes from Evidence Code Group discussion, not from Annotation Camp per se, will check with group and add to/remove this particular point as appropriate).
#*ND - (this wasn't part of the annotation camp discussions, just tagging it on the end!) Most of the annotations that were formerly to the 'unknown' terms but are now to the root nodes have the evidence code ND. The use of ND is useful for identifying these annotations, but it seems that there are some 'unknown' annotations that have other evidence codes (e.g. TAS and NAS where an author has stated in a paper that there is no data available). Should we standardize all of these to use ND? There are about 50-60 in total from all groups (Emily and Jane).


The highest priority for the GOC identified by grant reviewers and others is to enhance the ability of non-informatics biologists to access the GO resources as part of their data analysis efforts.  Some of the participants in the GO Consortium provide access to GO data analysis tools as part of their MOD resource.  But the GO site itself does not support data analysis directly from the site. We do provide listing of tools, but have not seen it as part of our 'mission' to support a general tool [such as GO TermFinder, for example] as part of the GOC. Should we do so?  Wouldn't this be analogous to provide a BLAST server?  While the individual MODs have a role to play here, the GOC is seen by some to be behind in supporting the work of the bench biologist.
==== Need more detail on the proposal ====
#Response to drug
#:Erika Feltrin has a proposal to overhaul the area of the ontology under 'response to drug', and the plan will also affect the 'drug transport' and 'xenobiotic' terms. The ontology working group have held an online content discussion meeting and agreed that this material should be presented to the consortium meeting if time allows.
::For a summary, see http://gocwiki.geneontology.org/index.php/Response_to_drug


* I think this should be given a high priority for discussion. In the AmiGO WG we've discussed adding simple tools (e.g. a cgi of map2slim) to AmiGO, but for some of the more complex tools e.g. microarray tools it's probably not worth developing our own as so many good tools already exist. But should we 'endorse' some of these existing tools? And then how could we have say in the direction they're developed? We already work fairly closely with some of these groups (Onto-Express, GoMiner).
==== No Discussion Needed ====
* Related to the list of tools, its great to have this long list but it makes it very hard for new users to know which to use and why. Some sort of critical review of features and suggested uses would be very handy. Selecting/Endorsing a 'best' tool in a given category (if there was/were clear winners) would also be really helpful to users but maybe more controversial if done by GOC vs some external author.
#Evaluation of project tracking methods
** In reviewing the tools one would identify lists of essential and desirable features that 'good' GO tools should ideally possess. If the GOC could approve a set of core features that all 'endorsed' tools should have this might provide a way to influence future tool development by others. Tools that met or exceeded the feature set would get higher billing than ones which didnt, presumably driving non-compliant tools to meet these higher standards or go by the wayside (similar to OBO foundry driving the development of higher quality ontologies).
#*Not sure what this would be?  This needs more definition. (Mike)
*In general, how much do we know about how bench biologists use GO? What tools and other enhancements would make biologists more aware of what GO has to offer, and help them use our resources? What do they want to do with GO?
#Handling multiple identifiers for gene products and sequences
#*Chris and others are exploring how to make this happen. A new service at EBI may be very useful. http://www.pir.uniprot.org/search/idmapping.shtml (Mike)
# The issue of using the GO_REF vs extension of the evidence codes to amplify upon the method that is used.  
#*(Question from Val) Does this include the proposal for introduction of a code to distinguish HTP experiments discussed at the curation meeting? if not can it be included?
#*Need more specifics about this item. I do not believe the intent was to discuss HTP but this needs to be stated. (Mike)
#Hide comments in AmiGO.  There is a conflict between the AmiGO browser as a tool for biologist users and the AmiGO browser as a tool for annotators. The 'commments' often are directed to annotators and can thus be considered either irrelevant or confusing to biologist users. In the case of obsoletes, one should just be directed to suggested terms.  Annotators might better use OBO-Edit to see comments. So, should we suppress display of comments on AmiGO?
#*Suggest that this be a topic for the AmiGO working group. (Mike)
#GO Consortium Tools (Evelyn, Emily)
#*GOA feels that GOC should not have tools on GO tool page unless they are maintained or at least highlight that fact, we also feel that we should consider perhaps a top 10 GOC reviewed set of tools that we can recommend and liase with on a regular basis. GOA can do that independently of GOC if GOC does not want to take such a position. Most users want advice on GO tools and presenting them with over 100 is not overly helpful. We also need to consider how to modify next GO users/tool meeting (already discussed on GO management I think?)
#*This is a resource issue. It would certainly be a good idea to have a small number of selected tools.  However, how had the time or wants to take the time to handle this? (Mike)
#Since Alex is unable to attend the meeting, perhaps we can arrange a time to have a web conference with him. This will show the group how we have been working from distributed sites and we can get an update on the immunology stuff. I suggest we use whatever technology we have found to be the best by the time of the meeting. Then we can discuss whether it is good enough to buy. etc. [submitted by dph].


'''''Utility for GO-Tool Developers'''''
=== New proposals ===
Tuesday, January 9th after lunch
#Protein Family based annotation tool - Suzi
#Term history tracking capability - John/Chris/and OBO-Edit group
#Incorporation of all gene product sequences and IDs into GO database and fasta files. How are we to accomplish this.
#New set of high-level terms for cellular component: fixes the problems of terms not being 'cellular components', allows alignment with CARO - Jane (in collaboration with Melissa)
#GO development "training": At the [[Managers_11Oct06#a._Ontology_Development |October 11 managers' conference call]], David, Midori and Jen proposed an informal training session for ontology development, so that more GO annotators will be able to work directly on the ontologies. We would cover using OBO-Edit and CVS in the GO context. David plans to stay on an extra day to work with the GO editors, and other annotators who want to do ontology development would be welcome.
#[[Users_meetings | Future users meetings]] - Jane and Eurie.
Wednesday, January 10th morning
#Unfinished topics from previous afternoon
#Summary and wrap-up
#Next consortium meeting


This is a note from a developer...
"we are in the process of making improvements to our GO analysis tool, the
Ontologizer, and I have noticed that one of the most time consuming
aspects of the process seems to be finding good datasets with which to
test new algorithms. I would guess that we are not entirely alone in this.
I wonder if it would be a good idea for the GO community to make a
repository of datasets for testing new algorithms and also for new users
to learn the ropes. I would be thinking of lists of study sets (say
differentially expressed genes) with corresponding population sets (say
all genes on the chip) together with short biological and methods
descriptions. Does such a repository already exist or does anyone have any
tips?"


'''''Should GO capture the context of activity'''''


Is there a facility to construct GO annotations which would capture the context in which a given gene product is associated with (e.g.) a given function? Thus, for example, instead of annotating protein X to oxygen transport [function] one might annotate:
== Proposed Discussion Topics ==


<protein X in cell component Y> to oxygen transport [function']
# 'response to drug' SF 1242405
# difference between function and process


For MGI, this information is captured for cell types, anatomy and target molecules, but it is not a GO-wide policy. MGI captures this information in structured notes. So, for example MGI curators can make an annotation for gene product X having some function in oxygen transport in a lung epithelial cell. The hope is that if X is also annotated to oxygen transporter activity in lung epithelial cell and plasma membrane of a lung epithelial cell, then we can draw a conclusion that X has the potential to execute a function of an oxygen transporter involved in oxygen transport in the plasma membrane of a lung epithelial cell. In the right spatio-temporal context, the function will be executed.
[[Category:GO Consortium Meetings‏‎]]
 
 
'''''Encouraging community annotation'''''
#How can we help community scientists to get credit for their GO work? Could we publish a short account of the work of individual content meetings with all experts having author credit? Could groups providing manual annotation of their own gene products of interest get a small publication giving an account of their contribution and of possible applications?
 
'''''Annotation Tool'''''
#Who is the target audience for the annotation tool?
#How does the idea of an annual functional annotation tool bake-off sound to you? How would you approach it? Can it be used to keep orphaned genomes up-to-date?
#Almost everyone agrees that the GO Consortium should develop an annotation tool, [but not all people since most groups do annotation within the context of their genome annotation stream] Once we begin asking questions about what the tool should do, it becomes clear that every group has very different ideas about what such a tool should do. Groups within the GO tend to advocate for the development of a tool that satisfies their own research efforts at the moment, because that is the immediate need as they understand it. Could the advisors suggest a way of determining which of these special-purpose applications are best for the GO user community ''as a whole'', so that we can decide how to focus our development efforts? What's the best way to encourage inter-group communication to foster collaboration on tools and minimize redundant development efforts? How can we decide whether specific, highly targeted annotation tools or general, lowest-common-denominator annotation tools would be most useful to our user community?
 
== Agenda ==
In order to meet our objectives, we need to present all of the material in the morning.
The presentations will focus on the goals and progress of the Consortium and will reflect the new organization and management of the GOC developed through the process of writing the GO Consortium competitive renewal. The times noted here are expected to be divided between a short presentation and then focused discussion on the topic.
 
 
** - 8:00AM: Introductions (picture on display and circle everyone)
** - 8:10 AM: Overview of Grant Aims (Michael-20 minutes)
** - 8:30 AM: Reference Genomes (Rex - 30 minutes)
** - 9:00 AM: Ontology/Database Developments - Biological (Midori and David-30 minutes)
** - 9:30 AM: Ontology Developments – Technical (Chris-30 minutes)
** -10:00 AM: Break (15 min)
** -10:15 AM: Emerging Genomes (Jen-20 minutes)
** -10:35AM:  User Advocacy including AmiGO (Eurie & Jane- 20 minutes)
** -10:55AM:  New management structure (Suzi- 20 minutes)
** -11:15AM:  Summary of issues we face: pink sheets and more (Judy- 20 minutes)
** -11:35AM: Open Discussion (all)
** -12:30PM – 1:30PM, working lunch for Advisors and GO-top
 
** - 1:30PM - 3:00PM, Advisors powwow
** - 3:00 PM - break
** - 3:30PM - 5:00PM, feedback from Advisors
** - 6:00PM dinner with everyone
 
== Participants ==
*GO representatives
#Michael Ashburner
#Judy Blake
#Eurie Hong
#Suzi Lewis
#Rex Chisholm
#Jen Clark
#Midori Harris
#David Hill
#Ben Hitz (representing production services)
#Jane Lomax
#Chris Mungall
#Simon Twigger
#Tanya Berardini
#Nicky Mulder
*GO Advisors
#<b>Larry Hunter</b>
#<b>Lynette Hirschman</b>
#<b>Barry Smith</b>
#<b>David States</b>
#<b>Mike Tyers</b>
#<b>Craig Neville-Manning</b> (unable to attend)
#<b>Peter Tarczy-Harnach</b>
#<b>Ian Dix</b> (Courtland Yockey attending)
*NIH Representative
#Peter Good
*Other possible attendees
#Monte Westerfield?
 
== Venue ==
Accommodation:  [http://www.scinns.com/universi.htm Silver Cloud Inns / Seattle-University Village]
 
  5036 25th Avenue NE
  Seattle, WA 98105
  Phone: 206-526-5200

Latest revision as of 10:11, 15 April 2019

NO MORE EDITS PLEASE

Downloads

Topics

GO Team and other Status Reports

Monday, December 8th

  1. 8:30 Reference Genomes - Rex&Karen
    • Summaries of the genomes - the display metrics
  2. 9:00 Ontology Content - David&Midori
    • IS_A complete
    • regulates (note: Chris has agreed to a joint ont dev/software perspective on this - mah)
    • Cell Ontology links
    • Collaboration with Jonathan Liu and MIT
  3. 9:30 Ontology & Software - Chris&Ben/Mike
    • Includes OBO-Edit working group report

break

  1. 10:30 Annotation outreach - Jen&Michelle
  2. 11:00 User Advocacy - Eurie&Jane
    • Includes AmiGO working group report
    • Includes 'Hub' report?
  3. 11:30 Operations Summary - Suzi
  4. Publications/Presentations/Tutorials/Posters (handout)

Issues to be addressed, ordered with harder topics first

Monday, January 8th after lunch & Tuesday, January 9th morning

Discussions

  1. GO policy on incorporating GOA annotations into MOD annotations (Evelyn and Mike/Judy?)
    • GO annotations have been stripped out of GOA-UniProt (all species file) and all other gene association files by using taxid stated within the file. This is defined within the GOC documentation at: http://www.geneontology.org/GO.annotation.shtml#script The plan was for each member group to integrate annotations that were being filtered out. To date this is only happening at MGI. The result is that annotations from GOA for experimentally results are being lost. GOA receivies a lot of user questions about how to get complete annotation datasets. The unstripped GOA-UniProt file is available on EBI and GOC FTP sites (however in the later is not clearly stated in our documentation).
    • GOA now integrates all experimental data from each GOC member on a monthly basis.
    • We have a GO policy on this. Perhaps, if a GOC member cannot integrate the manual annotation from GOA and others that taxid should not be filtered from the other gene association files?
    • In practice, the MOD groups identified need to be contacted to find out how they are doing in incorporating appropriate annotations in their files.
    • GOA PDB gene association file - should this file be stripped on GOC site by taxon id? File created by GOA and InterPro3D, special pipeline used as PDB entries do not map 1:1 to UniProt/other identifiers (Dan).
  2. Prioritize list for next ontology development meetings. Do we need to do these in sequence or parallel. Many of the same ontology developers are always involved {David, Midori, Jen, Jane}. However, there are cases where others are involved. Some prioritization may come for GO-engineering collarboration with MIT. At present, the sorted list is as follows.
    1. is_a complete (hopefully done by GOC meeting)
    2. some component of development and physiology of cardiovascular system (May)
    3. muscle development {suggest by Erika and colleagues}
    4. peripheral nervouse system {continuation of early CNS work}
    5. DNA repair? (perhaps Eurie could organize this?)
    6. Transport (suggested by Val)
    7. How do we give credit to external contributors to GO (Midori)
  3. Piped data for IPI, need consistency in usage (Evelyn)
    • IGI data allows piped accessions in the 'with' columns to capture the fact that two or more genes may be interacting simultaneously. IPI data also allows piped accessions in with column but some GOC members here use the pipe to specifically say that in a given paper that protein A, B and C precipitated together or form part of a complex others I think use it also for circumstances where 2 separate experiments in the same paper showed protein A interacted with protein B and to protein C. GOA prefers using it like IGI for a specific circumstance otherwise information is lost? Others??
    • Related Issue: GOA has decided for the moment not to pipe several protein binding interactions simply because it comes from same paper. We unwrap piped data from MODs because of inconsistency in usage and because this data not normalised (causes problems of database and web services)
    • Karen C adds: I think the same issues apply to IGI, so whatever we do should apply to the with column when used for either IPI or IGI, or perhaps for any use of the with column.
  4. Discussion of 'anatomical processes' such as 'heart pumping' in the process ontology. Should we add terms like this, how are we going to do it? If we are not, can we express these anatomical processes in another way?
    1. Add these terms and then make non-anatomical processes part_of them. This will create a lot of true path violations if different anatomical structures in different organisms carry out the same process. We would also have to make specific children.
    2. Create a method for 'annotating' anatomical structures from other ontologies with GO biological processes.
  5. Overlap/connections between GO and SO?
    Emily Dimmer submitted a SF item asking if GO would want to have terms in the component ontology to represent situations such as the finding that human myosin 6 coimmunoprecipitates with RNA pol II at the promoters and/or intragenic regions of active genes. After an email discussion between Karen E and Karen C, the question boils down to whether/how to make such a connection between SO and GO.
    • On the one hand, it seems redundant to repeat the terms in both places. In general we are trying to avoid overlap between the ontologies.
    • On the other hand, it seems that SO is used for the annotation of the sequences with respect to what they are, while GO is used for the annotation of gene products with respect to where they are located for component terms. Thus, I don't want to start mixing my annotations of gene products with SO terms as well as GO terms. If we want to be able to annotate these types of sequence locations as places where gene products can be localized, I'd rather do it in a way where there is a term in GO that has some relationship to a term in SO.
    The consensus is that we should discuss this issue at the GOC meeting. The SF item is here: https://sourceforge.net/tracker/index.php?func=detail&aid=1587313&group_id=36855&atid=440764
    This may also help with a question from Michelle about the provirus and viral genome terms: https://sourceforge.net/tracker/index.php?func=detail&aid=1571666&group_id=36855&atid=440764
  6. Do we want all groups to be able to provide structured notes, or do we want to proliferate GO terms for things like cell types? See https://sourceforge.net/tracker/index.php?func=detail&aid=1598448&group_id=36855&atid=440764 and https://sourceforge.net/tracker/index.php?func=detail&aid=1587269&group_id=36855&atid=440764
  7. Change in interpretation of the database identifier in DB column of association files (Emily). Change suggested so that the combination of the DB (column 1) and DB_Object_ID (column 2) fields provide a globally unique and resolvable identifier, rather than naming database submitting file (as currently defined). The ASSIGNED_BY column will still state from where the annotation originated.

Things that have been agreed, just need to do

  1. All MODs should provide a file with all protein sequences. Also the known UniProt or NCBI accessions should be included in the gp2protein file.
    • Each MOD has the goal of annotating all the gene products within their genome of interest. Thus each MOD has a dataset of proteins, even those that have not yet been annotated. This dataset should be provided from the MOD site, and from the GOC site. The dataset should include the UniProt or RefSeq accession if known.
    • The gp2protein file should include all the accession numbers even the accessions for proteins that have not yet been annotated.
    • The International sequence databases have an ownership system in place that limits who can make changes to the sequence or its annotations. Sometimes the MOD has newer information that is available from GenBank/EMBL/DDBJ because the authors are slow depositing updates. (Mike)
  2. Make our choice for on-line meeting support software (John D-R)

Put in Reports Session

  1. Do we want have time to update other GOC members on GO related grants that have been submitted or to be submitted or do we leave this info for project reports?(Evelyn)
  2. Perhaps part of Outreach Grp, would like to discuss experiences of GOc member with getting feedback from community on annotations, what works best, wiki, face2face chats, e-mail, online forms etc..(Evelyn)
  3. I would like an update on complex GO annotations (nomenclature, when and when not to request a term), GO collaborations with IntAct and CheBI etc...(Evelyn)

Need proposals for the new evidence code definitions

  1. Resolution of several Evidence Code issues from Annotation Camp (Karen & Evidence Code documentation committee)
    • What evidence code to use for profile HMM based annotations.(Michelle)
    At the annotation camp a proposal was raised to use RCA for profile HMMs while Michelle has argued that these should remain ISS. There is agreement that the models used for things like TMHMM and SignalP might better belong as RCA. However, there is disagreement about the the HMMs in the TIGRFAM and Pfam sets. The proposal says RCA, others argue it should be ISS.
    • (Note added by Val.) The original proposal was that ISS should only be used when transferring annotations to orthologs. This isn't always practical (or possible), as for some domains (i.e. F-box), we know they all act as as substrate specific adaptors for ubiquitin ligases, but we cannot unambiguously assign them to a characterised ortholog. However, the protein is clearly a family member (judged by assessing the alignment -ISS), has been named as an F-box by the laboratories studying these proteins (but are currently unpublished). I could leave this as IEA, but I wan't to show that this has been manually assessed. This is the only way we can weed out false positives from the electronic mappings (I have reported ~260 so far see https://sourceforge.net/tracker/?group_id=36855&atid=605890) Also using our protocols manual assignment overrides other possibly less granular redundant IEAs.
    The same would apply to many zf-fungal Zn(2)-Cys(6) binuclear cluster domain. All proteins with this domain are transcription factors, and based on the fact that they are members of this family (based on the multiple alignment-ISS). Sometimes the orthologs cannot be unambiguously identified (because of multiple deletions and duplications), for others the S. cerevisiae orthologs are not studied or annotated. However every single one characterised so far is a transcription factor. I don't see a problem with annotations ISS to the Pfam alignment for the functions which apply to ALL family members. In fact, with an ISS to a multiple alignment (as previously pointed out by Michelle) you can have greater confidence than an ISS to only a pairwise alignment. I see far more problems with ISS annotations which are not supported by anything in the 'with' column (too many to even provide feedback on). Converting IEA to ISS involves many things (selecting the correct degree of granularity, checking the alignment, checking that all proteins with the domain studied so far have this function, community feedback). But essentially these are ISS, not RCA.
    • (Karen C adds) At the recent Annotation Camp, we also agreed to use RCA for things like tRNA scan and the snoRNAs, but the more I think about it, I really think this is purely sequence based and thus should be given ISS, not RCA. We would also need to resolve what, if anything, could appropriately be put in the with column.
    • Flip side of the issue: What should RCA cover?
    At the Annotation Camp, we proposed to use RCA for a number of purely ISS-based methods where it was difficult/impossible to fill in the with column. Firstly, Michelle Gwinn has objected to disallowing use of ISS for purely sequence based methods. Secondly, RCA was initially proposed for computational methods that combined multiple data types and then performed some analysis that could be used to make predictions for GO terms. At the St. Croix GOC meeting, it was mentioned that the docs currently state that RCA should be for non-sequence based, but that it should probably be expanded to allow inclusion of sequence based data, provided that the computational method was not purely sequence based.
    • Boundary between ISS/RCA/IEA
    Once the above issues on what ISS and RCA should cover, we may also want to make sure we are clear on what is the policy for promoting an IEA to the appropriate curator reviewed code. The Annotation Camp minutes note that "There seems to be a lack of clarity on the proposed new boundaries between ISS, RCA, and IEA, particularly RCA and IEA. Even just the above two paragraphs leave me confused as to where one would use IEA versus RCA for an HMM-based method. The group as a whole may need to discuss this further." I'll also add that while the original boundary between IEA and ISS made a statement about curatorial review of that particular annotation, the guidelines for use or RCA stated only that the method have been reviewed and validated, not that each individual annotation be validated by a curator.
    • Clarification of TAS and NAS
    1. TAS - At the Annotation Camp, we agreed to limit use of TAS to situations where you can say "Paper A that I was annotating referred to paper B as the source of this statement". This would exclude the historical usage of TAS for common knowledge statements. Basically, this code would only be for cases where you can go the paper cited for the annotation and trace the statement to a cited reference. To use TAS, there is no requirement to go to the cited paper and confirm that it contains experimental characterization of the species of interest, because that defeats the purpose of the TAS code. However, recognizing that authors are not always precise with respect to species when citing references, Reference Genomes have agreed to avoid use of this code whenever possible. We should probably add documentation about this issue with the recommendation that tracking down the cited reference and annotating from it is recommended when possible.
    2. NAS - At the Annotation Camp, we agreed that NAS should be used in all cases where the author makes a statement that a curator wants to capture but cannot be traced to a specific publication and this should apply to both peer reviewed papers and information from textbooks.
    NAS and proposed use of with column - An example of when to use NAS and what to put in the WITH column was provided by David H at the 2006 annotation camp as follows: "If I draw the conclusion that a transcription factor is in the nucleus then it is IC; if the author draws that conclusion then it is NAS. The WITH field would contain the GOID for 'transcription factor activity' in each of these cases. Note that this is an expansion of the use of the WITH field for the NAS evidence code."
    • IEP - may be some need to clarify usage of this code (note that this comes from Evidence Code Group discussion, not from Annotation Camp per se, will check with group and add to/remove this particular point as appropriate).
    • ND - (this wasn't part of the annotation camp discussions, just tagging it on the end!) Most of the annotations that were formerly to the 'unknown' terms but are now to the root nodes have the evidence code ND. The use of ND is useful for identifying these annotations, but it seems that there are some 'unknown' annotations that have other evidence codes (e.g. TAS and NAS where an author has stated in a paper that there is no data available). Should we standardize all of these to use ND? There are about 50-60 in total from all groups (Emily and Jane).

Need more detail on the proposal

  1. Response to drug
    Erika Feltrin has a proposal to overhaul the area of the ontology under 'response to drug', and the plan will also affect the 'drug transport' and 'xenobiotic' terms. The ontology working group have held an online content discussion meeting and agreed that this material should be presented to the consortium meeting if time allows.
For a summary, see http://gocwiki.geneontology.org/index.php/Response_to_drug

No Discussion Needed

  1. Evaluation of project tracking methods
    • Not sure what this would be? This needs more definition. (Mike)
  2. Handling multiple identifiers for gene products and sequences
  3. The issue of using the GO_REF vs extension of the evidence codes to amplify upon the method that is used.
    • (Question from Val) Does this include the proposal for introduction of a code to distinguish HTP experiments discussed at the curation meeting? if not can it be included?
    • Need more specifics about this item. I do not believe the intent was to discuss HTP but this needs to be stated. (Mike)
  4. Hide comments in AmiGO. There is a conflict between the AmiGO browser as a tool for biologist users and the AmiGO browser as a tool for annotators. The 'commments' often are directed to annotators and can thus be considered either irrelevant or confusing to biologist users. In the case of obsoletes, one should just be directed to suggested terms. Annotators might better use OBO-Edit to see comments. So, should we suppress display of comments on AmiGO?
    • Suggest that this be a topic for the AmiGO working group. (Mike)
  5. GO Consortium Tools (Evelyn, Emily)
    • GOA feels that GOC should not have tools on GO tool page unless they are maintained or at least highlight that fact, we also feel that we should consider perhaps a top 10 GOC reviewed set of tools that we can recommend and liase with on a regular basis. GOA can do that independently of GOC if GOC does not want to take such a position. Most users want advice on GO tools and presenting them with over 100 is not overly helpful. We also need to consider how to modify next GO users/tool meeting (already discussed on GO management I think?)
    • This is a resource issue. It would certainly be a good idea to have a small number of selected tools. However, how had the time or wants to take the time to handle this? (Mike)
  6. Since Alex is unable to attend the meeting, perhaps we can arrange a time to have a web conference with him. This will show the group how we have been working from distributed sites and we can get an update on the immunology stuff. I suggest we use whatever technology we have found to be the best by the time of the meeting. Then we can discuss whether it is good enough to buy. etc. [submitted by dph].

New proposals

Tuesday, January 9th after lunch

  1. Protein Family based annotation tool - Suzi
  2. Term history tracking capability - John/Chris/and OBO-Edit group
  3. Incorporation of all gene product sequences and IDs into GO database and fasta files. How are we to accomplish this.
  4. New set of high-level terms for cellular component: fixes the problems of terms not being 'cellular components', allows alignment with CARO - Jane (in collaboration with Melissa)
  5. GO development "training": At the October 11 managers' conference call, David, Midori and Jen proposed an informal training session for ontology development, so that more GO annotators will be able to work directly on the ontologies. We would cover using OBO-Edit and CVS in the GO context. David plans to stay on an extra day to work with the GO editors, and other annotators who want to do ontology development would be welcome.
  6. Future users meetings - Jane and Eurie.

Wednesday, January 10th morning

  1. Unfinished topics from previous afternoon
  2. Summary and wrap-up
  3. Next consortium meeting


Proposed Discussion Topics

  1. 'response to drug' SF 1242405
  2. difference between function and process