Difference between revisions of "GO Consortium Meeting 2007"

From GO Wiki
Jump to: navigation, search
(Issues to be addressed)
(Issues to be addressed)
Line 22: Line 22:
 
#gp2protein file:  
 
#gp2protein file:  
 
#*The lagtime between when the protein sequences of a newly sequenced and annotated organism are published, and when they make it into UniProt.  For example, even now only ~10% of 27,855 arabidopsis protein sequences are contained in Swiss-Prot.  For the final release (version 5) of arabidopsis, 306 proteins (~1%) are available in Swiss-Prot and 374 in TrEMBL.  Older arabidopsis sequences are found in TrEMBL, but fully 1/3 of the sequences found in the first release have changed over the life of the project.   
 
#*The lagtime between when the protein sequences of a newly sequenced and annotated organism are published, and when they make it into UniProt.  For example, even now only ~10% of 27,855 arabidopsis protein sequences are contained in Swiss-Prot.  For the final release (version 5) of arabidopsis, 306 proteins (~1%) are available in Swiss-Prot and 374 in TrEMBL.  Older arabidopsis sequences are found in TrEMBL, but fully 1/3 of the sequences found in the first release have changed over the life of the project.   
(Ans From Evelyn):This problem stems from the fact that corrections to the original  
+
#*(Ans From Evelyn):This problem stems from the fact that corrections to the original  
 
genome sequence have not been submitted to EMBL but only to TAIR. Paul Kersey at EBI is responsible for importing sequences from TAIR to UniProtKB (evelyn querying this). Why is this data or annotation not submitted to the EMBL/DDBJ/GenBank international nucleotide sequence databank? or is it??
 
genome sequence have not been submitted to EMBL but only to TAIR. Paul Kersey at EBI is responsible for importing sequences from TAIR to UniProtKB (evelyn querying this). Why is this data or annotation not submitted to the EMBL/DDBJ/GenBank international nucleotide sequence databank? or is it??
#*GOST needs to use the precise sequences, as supplied by the submitting group
+
#GOST needs to use the precise sequences, as supplied by the submitting group
 
#Handling multiple identifiers for gene products and sequences
 
#Handling multiple identifiers for gene products and sequences
 
#Evaluation of project tracking methods
 
#Evaluation of project tracking methods

Revision as of 07:07, 8 November 2006

Topics

Please add items below that you think need to be presented. We are not -yet- putting these in any particular order or time, we're just collecting the topics we need to address while we are together.

GO Team and other Status Reports

Listed with potential people to provide the summaries

  1. Reference Genomes - Rex&Karen
  2. Ontology Content - David&Midori
    • IS_A complete
    • regulates
    • Cell Ontology links
  3. Ontology & Software - Chris&Ben/Mike
    • Includes OBO-Edit working group report
  4. Annotation outreach - Jen&Michelle
  5. User Advocacy - Eurie&Jane
    • Includes AmiGO working group report
    • Includes 'Hub' report?
  6. Operations - Suzi
  7. Publications/Presentations/Tutorials/Posters

Issues to be addressed

  1. The issue of using the GO_REF vs extension of the evidence codes to amplify upon the method that is used.
    • (Question from Val) Does this include the proposal for introduction of a code to distinguish HTP experiments discussed at the curation meeting? if not can it be included?
  2. gp2protein file:
    • The lagtime between when the protein sequences of a newly sequenced and annotated organism are published, and when they make it into UniProt. For example, even now only ~10% of 27,855 arabidopsis protein sequences are contained in Swiss-Prot. For the final release (version 5) of arabidopsis, 306 proteins (~1%) are available in Swiss-Prot and 374 in TrEMBL. Older arabidopsis sequences are found in TrEMBL, but fully 1/3 of the sequences found in the first release have changed over the life of the project.
    • (Ans From Evelyn):This problem stems from the fact that corrections to the original

genome sequence have not been submitted to EMBL but only to TAIR. Paul Kersey at EBI is responsible for importing sequences from TAIR to UniProtKB (evelyn querying this). Why is this data or annotation not submitted to the EMBL/DDBJ/GenBank international nucleotide sequence databank? or is it??

  1. GOST needs to use the precise sequences, as supplied by the submitting group
  2. Handling multiple identifiers for gene products and sequences
  3. Evaluation of project tracking methods
  4. GO policy on incorporating GOA annotations into MOD annotations (Evelyn and Mike/Judy?)

GO annotations have been stripped out of GOA-UniProt (all species file) on GO site for taxon Id's represented by other GOC members. The idea was that the other GOC members would integrate annotations from GOA. Experimentally verified data from GOA was being lost 6000-ish for Mouse alone, although this is now working propoerly. The GOA group have been receiving alot of questions about how to get complete annotation datasets from GOC. GOA-UniProt file unstripped available on EBI ftp site. GOA now integrating all experimental data from all other GOC members on monthly basis. Can we have a GO policy on this? If a GOC member can't integrate GOA manual annotation should that taxon Id continue to be stripped from GOA-UniProt file? That is the current decison. In practice, the MOD groups identified need to be contacted to find out how they are doing in incorporating appropriate annotations in their files.

  1. Hide comments in AmiGO. There is a conflict between the AmiGO browser as a tool for biologist users and the AmiGO browser as a tool for annotators. The 'commments' often are directed to annotators and can thus be considered either irrelevant or confusing to biologist users. In the case of obsoletes, one should just be directed to suggested terms. Annotators might better use OBO-Edit to see comments. So, should we suppress display of comments on AmiGO?
  2. Prioritize list for next ontology development meetings. We need to do these in sequence since the same ontology developers are always involved {David, Midori, Jen, Jane}. Some prioritization may come for GO-engineering collarboration witn MIT. At present, the sorted list is as follows.
    • is_a complete (hopefully done by GOC meeting}
    • some component of development and physiology of cardiovascular system {May}
    • muscle development {suggest by Erika and colleagues}
    • peripheral nervouse system {continuation of early CNS work}
    • DNA repair???? Eurie???
  3. Piped data for IPI, need consistency in usage (Evelyn)

IGI data allows piped accessions in the 'with' columns to capture the fact that two or more genes may be interacting simultaneously. IPI data also allows piped accessions in with column but some GOC members here use the pipe to specifically say that in a given paper that protein A, B and C precipitated together or form part of a complex others I think use it also for circumstances where 2 separate experiments in the same paper showed protein A interacted with protein B and to protein C. GOA prefers using it like IGI for a specific circumstance otherwise information is lost? Others?? Related Issue: GOA has decided for the moment not to pipe several protein binding interactions simply because it comes from same paper. We unwrap piped data from MODS because of inconsistency in usage and because this data not normalised (causes problems of database and web services)

  1. GO Consortium Tools (Evelyn, Emily)

GOA feels that GOC should not have tools on GO tool page unless they are maintained or at least highlight that fact, we also feel that we should consider perhaps a top 10 GOC reviewed set of tools that we can recommend and liase with on a regular basis. GOA can do that independently of GOC if GOC does not want to take such a position. Most users want advice on GO tools and presenting them with over 100 is not overly helpful. We also need to consider how to modify next GO users/tool meeting (already discussed on GO management I think?)

  1. What evidence code to use for profile HMM based annotations.(Michelle)

At the annotation camp a proposal was raised to use RCA for profile HMMs while Michelle has argued that these should remain ISS. There is agreement that the models used for things like TMHMM and SignalP might better belong as RCA. However, there is disagreement about the the HMMs in the TIGRFAM and Pfam sets. The proposal says RCA, others argue it should be ISS.

  1. Response to drug

Erika Feltrin has a proposal to overhaul the area of the ontology under 'response to drug', and the plan will also affect the 'drug transport' and 'xenobiotic' terms. The ontology working group have held and onling content discussion meeting and agreed that this material should be presented to the consortium meeting if time allows.

New proposals

  1. Protein Family based annotation tool - Suzi
  2. Term history tracking capability - John/Chris/and OBO-Edit group
  3. Incorporation of all gene product sequences and IDs into GO database and fasta files. How are we to accomplish this.
  4. New set of high-level terms for cellular component: fixes the problems of terms not being 'cellular components', allows alignment with CARO - Jane (in collaboration with Melissa)

Venue

 Click here to register for this meeting. Please do this by October 31st
 Jesus College
 Jesus Lane, Cambridge, CB5 8BL, UK
 Cambridge, UK