GO Consortium Meeting 2007

From GO Wiki
Revision as of 16:55, 21 November 2006 by Kchris (talk | contribs) (Issues to be addressed)

Jump to: navigation, search


Please add items below that you think need to be presented. We are not -yet- putting these in any particular order or time, we're just collecting the topics we need to address while we are together.

GO Team and other Status Reports

Listed with potential people to provide the summaries

  1. Reference Genomes - Rex&Karen
  2. Ontology Content - David&Midori
    • IS_A complete
    • regulates
    • Cell Ontology links
  3. Ontology & Software - Chris&Ben/Mike
    • Includes OBO-Edit working group report
  4. Annotation outreach - Jen&Michelle
  5. User Advocacy - Eurie&Jane
    • Includes AmiGO working group report
    • Includes 'Hub' report?
  6. Operations - Suzi
  7. Publications/Presentations/Tutorials/Posters

Issues to be addressed

  1. The issue of using the GO_REF vs extension of the evidence codes to amplify upon the method that is used.
    • (Question from Val) Does this include the proposal for introduction of a code to distinguish HTP experiments discussed at the curation meeting? if not can it be included?
  2. gp2protein file:
    • The lagtime between when the protein sequences of a newly sequenced and annotated organism are published, and when they make it into UniProt. For example, even now only ~10% of 27,855 arabidopsis protein sequences are contained in Swiss-Prot. For the final release (version 5) of arabidopsis, 306 proteins (~1%) are available in Swiss-Prot and 374 in TrEMBL. Older arabidopsis sequences are found in TrEMBL, but fully 1/3 of the sequences found in the first release have changed over the life of the project.
    • (Ans From Evelyn):This problem stems from the fact that corrections to the original genome sequence have not been submitted to EMBL but only to TAIR. Paul Kersey at EBI is responsible for importing sequences from TAIR to UniProtKB (evelyn querying this). Why is this data or annotation not submitted to the EMBL/DDBJ/GenBank international nucleotide sequence databank? or is it??
  3. GOST needs to use the precise sequences, as supplied by the submitting group
  4. Handling multiple identifiers for gene products and sequences
  5. Evaluation of project tracking methods
  6. GO policy on incorporating GOA annotations into MOD annotations (Evelyn and Mike/Judy?)
    GO annotations have been stripped out of GOA-UniProt (all species file) on GO site for taxon Id's represented by other GOC members. The idea was that the other GOC members would integrate annotations from GOA. Experimentally verified data from GOA was being lost 6000-ish for Mouse alone, although this is now working propoerly. The GOA group have been receiving alot of questions about how to get complete annotation datasets from GOC. GOA-UniProt file unstripped available on EBI ftp site. GOA now integrating all experimental data from all other GOC members on monthly basis. Can we have a GO policy on this? If a GOC member can't integrate GOA manual annotation should that taxon Id continue to be stripped from GOA-UniProt file? That is the current decison. In practice, the MOD groups identified need to be contacted to find out how they are doing in incorporating appropriate annotations in their files.
  7. Hide comments in AmiGO. There is a conflict between the AmiGO browser as a tool for biologist users and the AmiGO browser as a tool for annotators. The 'commments' often are directed to annotators and can thus be considered either irrelevant or confusing to biologist users. In the case of obsoletes, one should just be directed to suggested terms. Annotators might better use OBO-Edit to see comments. So, should we suppress display of comments on AmiGO?
  8. Prioritize list for next ontology development meetings. We need to do these in sequence since the same ontology developers are always involved {David, Midori, Jen, Jane}. Some prioritization may come for GO-engineering collarboration witn MIT. At present, the sorted list is as follows.
    1. is_a complete (hopefully done by GOC meeting}
    2. some component of development and physiology of cardiovascular system {May}
    3. muscle development {suggest by Erika and colleagues}
    4. peripheral nervouse system {continuation of early CNS work}
    5. DNA repair???? Eurie???
    6. Transport (suggested by Val)
  9. Piped data for IPI, need consistency in usage (Evelyn)
    • IGI data allows piped accessions in the 'with' columns to capture the fact that two or more genes may be interacting simultaneously. IPI data also allows piped accessions in with column but some GOC members here use the pipe to specifically say that in a given paper that protein A, B and C precipitated together or form part of a complex others I think use it also for circumstances where 2 separate experiments in the same paper showed protein A interacted with protein B and to protein C. GOA prefers using it like IGI for a specific circumstance otherwise information is lost? Others??
    • Related Issue: GOA has decided for the moment not to pipe several protein binding interactions simply because it comes from same paper. We unwrap piped data from MODS because of inconsistency in usage and because this data not normalised (causes problems of database and web services)
    • Karen C adds: I think the same issues apply to IGI, so whatever we do should apply to the with column when used for either IPI or IGI, or perhaps for any use of the with column.
  10. GO Consortium Tools (Evelyn, Emily)
    GOA feels that GOC should not have tools on GO tool page unless they are maintained or at least highlight that fact, we also feel that we should consider perhaps a top 10 GOC reviewed set of tools that we can recommend and liase with on a regular basis. GOA can do that independently of GOC if GOC does not want to take such a position. Most users want advice on GO tools and presenting them with over 100 is not overly helpful. We also need to consider how to modify next GO users/tool meeting (already discussed on GO management I think?)
  11. Resolution of several Evidence Code issues from Annotation Camp
    • What evidence code to use for profile HMM based annotations.(Michelle)
    At the annotation camp a proposal was raised to use RCA for profile HMMs while Michelle has argued that these should remain ISS. There is agreement that the models used for things like TMHMM and SignalP might better belong as RCA. However, there is disagreement about the the HMMs in the TIGRFAM and Pfam sets. The proposal says RCA, others argue it should be ISS.
    Note added by Val. The original proposal was that ISS should only be used when transferring annotations to orthologs. This isn't always practical (or possible), as for some domains (i.e. F-box), we know they all act as as substrate specific adaptors for ubiquitin ligases, but we cannot unambiguously assign them to a characterised ortholog. However, the protein is clearly a family member (judged by assessing the alignment -ISS), has been named as an F-box by the laboratories studying these proteins (but are currently unpublished). I could leave this as IEA, but I wan't to show that this has been manually assessed. This is the only way we can weed out false positives from the electronic mappings (I have reported ~260 so far see https://sourceforge.net/tracker/?group_id=36855&atid=605890) Also using our protocols manual assignment overrides other possibly less granular redundant IEAs.
    The same would apply to many zf-fungal Zn(2)-Cys(6) binuclear cluster domain. All proteins with this domain are transcription factors, and based on the fact that they are members of this family (based on the multiple alignment-ISS). Sometimes the orthologs cannot be unambiguously identified (because of multiple deletions and duplications), for others the S. cerevisiae orthologs are not studied or annotated. However every single one characterised so far is a transcription factor. I don't see a problem with annotations ISS to the Pfam alignment for the functions which apply to ALL family members. In fact, with an ISS to a multiple alignment (as previously pointed out by michelle) you can have greater confidence than an ISS to only a pairwise alignment. I see far more problems with ISS annotations which are not supported by anything in the 'with' column (too many to even provide feedback on). Converting IEA to ISS involves many things (selecting the correct degree of granularity, checking the alignment, checking that all proteins with the domain studied so far have this function, community feedback). But essentially these are ISS, not RCA.
    Karen C adds: At the recent Annotation Camp, we also agreed to use RCA for things like tRNA scan and the snoRNAs, but the more I think about it, I really think this is purely sequence based and thus should be given ISS, not RCA. We would also need to resolve what, if anything, could appropriately be put in the with column.
    • flip side of the issue: What should RCA cover?
    At the Annotation Camp, we proposed to use RCA for a number of purely ISS-based methods where it was difficult/impossible to fill the with column. Firstly, Michelle Gwinn has objected to disallowing use of ISS for purely sequence based methods. Secondly, RCA was initially proposed for computational methods that combined multiple data types and then performed some analysis that could be used to make predictions for GO terms. At the St. Croix GO meeting, it was mentioned that the docs currently state that RCA should be for non-sequence based, but that it should probably be expanded to allow inclusion of sequence based data, provided that the computational method was not purely sequence based.
    • Boundary between ISS/RCA/IEA
    Once the above issues on what ISS and RCA should cover, we may also want to make sure we are clear on what is the policy for promoting an IEA to the appropriate curator reviewed code. The Annotation Camp minutes note that "There seems to be a lack of clarity on the proposed new boundaries between ISS, RCA, and IEA, particularly RCA and IEA. Even just the above two paragraphs leave me confused as to where one would use IEA versus RCA for an HMM-based method. The group as a whole may need to discuss this further." I'll also add that while the original boundary between IEA and ISS made a statement about curatorial review of that particular annotation, the guidelines for use or RCA stated only that the method have been reviewed and validated, not that each individual annotation be validated by a curator.
    • Clarification of TAS and NAS
    1. TAS - At the Annotation Camp, I believe we agreed to limit use of TAS to situations where you can say "Paper A that I was annotating from referred to paper B as the source of this statement". This would exclude the historical usage of TAS for common knowledge statements. Basically, this code would only be for cases where you can go the paper cited for the annotation and trace the statement to a cited reference. To use TAS, there is no requirement to go to the cited paper and confirm that it contains experimental characterization of the species of interest, because that defeats the purpose of the TAS code. However, recognizing that authors are not always precise with respect to species when citing references, Reference Genomes have agreed to avoid use of this code whenever possible. We should probably add documentation about this issue with the recommendation that tracking down the cited reference and annotating from it is recommended when possible.
    2. NAS - At the Annotation Camp, we agreed that NAS should be used in all cases where author makes a statement that a curator wants to capture but cannot be traced to a specific publication and this should apply to both peer reviewed papers and information from textbooks.
    NAS and proposed use of with column - An example of when to use NAS and what to put in the WITH column was provided by David H at the 2006 annotation camp as follows: "If I draw the conclusion that a transcription factor is in the nucleus then it is IC; if the author draws that conclusion then it is NAS. The with field would contain the GOid for 'transcription factor activity' in each of these cases. Note that this is an expansion of the use of the with field for the NAS evidence code."
    • IEP - may be some need to clarify usage of this code (note that this comes from Evidence Code Group discussion, not from Annotation Camp per se, will check with group and add to/remove this particular point as appropriate).
  12. Response to drug
    Erika Feltrin has a proposal to overhaul the area of the ontology under 'response to drug', and the plan will also affect the 'drug transport' and 'xenobiotic' terms. The ontology working group have held and onling content discussion meeting and agreed that this material should be presented to the consortium meeting if time allows.
  13. Discuss if we are going to put 'anatomical processes' such as 'heart pumping' in the process ontology. If we are, how are we going to do it? If we are not, can we express these anatomical processes in another way?
    1. Add these terms and then make non-anatomical processes part_of them. This will create a lot of true path violations if different anatomical structures in different organisms carry out the same process. We would also have to make specific children.
    2. Create a method for 'annotating' anatomical structures from other ontologies with GO biological processes.

New proposals

  1. Protein Family based annotation tool - Suzi
  2. Term history tracking capability - John/Chris/and OBO-Edit group
  3. Incorporation of all gene product sequences and IDs into GO database and fasta files. How are we to accomplish this.
  4. New set of high-level terms for cellular component: fixes the problems of terms not being 'cellular components', allows alignment with CARO - Jane (in collaboration with Melissa)
  5. GO development "training": At the October 11 managers' conference call, David, Midori and Jen proposed an informal training session for ontology development, so that more GO annotators will be able to work directly on the ontologies. We would cover using OBO-Edit and CVS in the GO context. David plans to stay on an extra day to work with the GO editors, and other annotators who want to do ontology development would be welcome.


 Click here to register for this meeting. Please do this by October 31st
 Jesus College
 Jesus Lane, Cambridge, CB5 8BL, UK
 Cambridge, UK