Difference between revisions of "Annotation Conf. Call 2020-04-07"

From GO Wiki
Jump to: navigation, search
m
m (Use of Relations in Annotation)
Line 42: Line 42:
 
== Use of Relations in Annotation ==
 
== Use of Relations in Annotation ==
 
* Consistency in use of gene-product-to-term relations is an ongoing concern
 
* Consistency in use of gene-product-to-term relations is an ongoing concern
* Previous conference call we discussed soliciting ideas from groups about how to leverage expertise in specific areas of biology or curation to help garner consistent representation in our annotations
+
* Previous conference call we discussed soliciting ideas from groups about how to leverage expertise in specific areas of biology or curation to help achieve consistent representation in our annotations
 
* Follow-up
 
* Follow-up
 
** Discuss at May meeting?
 
** Discuss at May meeting?

Revision as of 08:00, 6 April 2020

Agenda and Minutes

GOC Meeting - May 2020

  • Paris meeting scheduled for May 11 - 14th will be held virtually
  • Please keep the dates open
  • More details will be forthcoming, but note we are planning for shorter days to accommodate all time zones as best as possible
  • Agenda

File Formats

Proposed GAF Update, GAF 2.2

  • GAFs will continue to be produced and supported by the GOC into the foreseeable future
  • We are proposing an incremental update to the GAF, though, to allow for use of the full set of gp2term relations
  • Proposed set of allowed relations is in this github ticket: https://github.com/geneontology/go-annotation/issues/2917
    • This would be the same set of gp2term relations used in the GPAD, the main difference is that they are in the same column as negation and when both apply they are pipe-separted
    • Default gp2term relations:
      • Molecular Function: enables
      • Cellular Component: located_in
      • Biological Process: individual groups decide based on annotation practice
  • Why do this?
    • The expanded set of gp2term relations is available in annotation tools, e.g. Noctua and Protein2GO, but by not including them in the GAF, we don't give the GOC, or users, any mechanism to filter specific sets of annotations for GAFs and subsequent analyses. Making this change will allow GO and groups to do this.


GPAD/GPI 2.0 Specifications

  • Questions/Comments
    • We need guidelines for how to represent proteins ids or accessions shared by multiple gene products, e.g. histones. [Stacia]
  • Response:
    • Groups should work with UniProt or internally to disambiguate protein records that refer to multiple genes, i.e. generate gene-centric protein entries in UniProt
    • In the interim, we are proposing that protein entries that correspond to the product of multiple genes be included as a single entry in the gpi, with one name chosen as a symbol, and other names listed as synonyms. All parent gene ids will be listed in the 'Encoded_By' field.

Deadlines

  • Groups have asked by what date they need to produce these new files.
    • Alliance members that will be migrating to Noctua, will need to use the gpad/gpi 2.0 file format for the data ingest.
    • Otherwise, GO will continue to consume and produce GAF for the foreseeable future, but note that gpad/gpi is a richer and more robust file format and is preferred for data exchange within the GOC.
    • For GAF 2.2, the software group will need some time to prepare the pipeline and other software (e.g. AmiGO) and we will also want to announce this change to the community, so there will be some lead time here (exact TBD).

Use of Relations in Annotation

  • Consistency in use of gene-product-to-term relations is an ongoing concern
  • Previous conference call we discussed soliciting ideas from groups about how to leverage expertise in specific areas of biology or curation to help achieve consistent representation in our annotations
  • Follow-up
    • Discuss at May meeting?

Attendance

  • On call: