Annotation Conf. Call 2020-04-21

From GO Wiki
Jump to navigation Jump to search

Agenda and Minutes

GOC Meeting - May 2020

  • Paris consortium meeting scheduled for May 12th - 14th will be held virtually
  • Please keep the dates open
  • We are planning for shorter days to accommodate all time zones as best as possible
  • Please suggest agenda items
  • Users meeting - May 11th

File Formats

Proposed GAF Update, GAF 2.2

  • Started documentation/FAQ for community
  • GAF 2.1 will continue to be produced and supported by the GOC into the foreseeable future
  • However, we are proposing an incremental update to the GAF to allow for use of the full set of gp2term relations
  • Proposed set of allowed relations is in this github ticket: https://github.com/geneontology/go-annotation/issues/2917
    • This would be the same set of gp2term relations used in the GPAD, the main difference is that they are in the same column as negation and when both apply they are pipe-separted
    • Default gp2term relations:
      • Molecular Function: enables
      • Cellular Component: located_in
      • Biological Process: individual groups decide based on annotation practice
  • Why do this?
    • The expanded set of gp2term relations is available in annotation tools, e.g. Noctua and Protein2GO, but by not including them in the GAF, we don't give the GOC, or users, any mechanism to filter specific sets of annotations for GAFs and subsequent analyses. Making this change will allow GO and groups to do this.

GPAD/GPI 2.0 Specifications

  • Questions/Comments
    • We need guidelines for how to represent proteins ids or accessions shared by multiple gene products, e.g. histones. [Stacia]
  • Response:
    • Groups should work with UniProt or internally to disambiguate protein records that refer to multiple genes, i.e. generate gene-centric protein entries in UniProt
    • In the interim, we are proposing that protein entries that correspond to the product of multiple genes be included as a single entry in the gpi, with one name chosen as a symbol, and other names listed as synonyms. All parent gene ids will be listed in the 'Encoded_By' field.

Deadlines

  • Groups have asked by what date they need to produce these new files.
    • Alliance members that will be migrating to Noctua, will need to use the gpad/gpi 2.0 file format for the data ingest.
    • Otherwise, GO will continue to consume and produce GAF for the foreseeable future, but note that gpad/gpi is a richer and more robust file format and is preferred for data exchange within the GOC.
    • For GAF 2.2, the software group will need some time to prepare the pipeline and other software (e.g. AmiGO) and we will also want to announce this change to the community, so there will be some lead time here (exact TBD).

Use of Relations in Annotation

  • Consistency in use of gene-product-to-term relations is an ongoing concern
  • Previous conference call we discussed soliciting ideas from groups about how to leverage expertise in specific areas of biology or curation to help achieve consistent representation in our annotations
  • Follow-up
    • In what areas of the BP branch do curators have expertise?
      • Also look at Annual Reviews? Organism-specific 'book's, e.g. WormBook?
    • Discuss in more detail at May meeting to begin formulating a plan and more concrete deliverables, e.g. GO-CAM templates, curation guidelines?

Attendance

  • On call: