2018 Montreal GOC Meeting Agenda: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 84: Line 84:


Datasets (obtaining/maintaining complete datasets with unique identifiers)
Datasets (obtaining/maintaining complete datasets with unique identifiers)
Overview.
Nobody seems really sure what happens. I'll document what I think happens here and then run it by others to confirm
1. GO uses
https://www.ebi.ac.uk/reference_proteomes to define the set of human IDs uniquely. This is also used by Panther.
This reference proteome set represents each HGNC ID uniquely.
This causes issues when 2 proteins are encoded by the  a loci described by a single HGNC name.
2. UniProt has other versions of reference proteome (I asked UniProt helpdesk about this).
Questions
How do changes in the reference proteome in-between releases affect GO? i.e  What happens to new or revised IDs if they are used in GO annotations, but are not represented in the reference proteome?


* https://github.com/geneontology/go-site/issues/756
* https://github.com/geneontology/go-site/issues/756

Revision as of 10:58, 23 September 2018

Montreal meeting Milestones

New Pipeline & new pipeline documentation

Kimberly, Pascale, Chris, Seth, etc Must be completed (as much as possible) and announced

Information points: Manage migration of consortium members to use explicit snapshot PURLs :

GO rules update, pipeline and error reports

Eric & Pascale

PAINT pipeline update

PAINT GAF file generation QC

Huaiyu and Pascale https://github.com/orgs/geneontology/projects/23

Noctua 1.1

Kimberly Seth, etc https://github.com/orgs/geneontology/projects/19

AmiGO update

https://github.com/orgs/geneontology/projects/21

GO website migration

Laurent-Philippe, Suzanna, Suzi, etc: https://github.com/orgs/geneontology/projects/22

Ontology and Annotation documentation update

David, Kimberly and Pascale (random thoughts:

  • Ontology: mention the creation of 'projects' in GH where we moved old projects, so that its' easier to find old discussions

GO subsets update

Pascale:

  • Deprecated a number of unused/unmaintained subsets
  • Show subset yaml files and how they are used
  • Each subset needs a maintainer

PAINT tickets

Marc & Pascale https://github.com/geneontology/go-annotation/labels/PAINT%20annotation

Annotation

Suggestions:

  • Signaling 2017 update
  • Transcription reviews
  • ECM reviews

Ontology

Suggestions:

Handling redundant information

  • Define redundant information:

In AmiGO, we should be able to improve the display by removing redundant information. That information may be useful for certain purposes, so we should provide it in files. We could also provide the 'core set' ('stringent set') in some version of files.

Discussion points: Is redundant, non-experimental annotation ever useful?

  • Are there any use cases where people have used these annotations for some type of analysis?
  • Some pipelines (InterPro2GO, SPKW, PAINT, F-P links), sometimes provide data that is already captured experimentally, and some groups would like the redundancy reduced.
  • Should all GOC members be handling redundancy in the same way?
  • If redundant, non-experimental annotations are present and are going to be removed, at what point in the pipeline should they be filtered, e.g. annotation file production by GOC, annotation file processing by MODs, website display?
  • If we filter annotations files, should we then also provide two annotation files for users, one complete and one filtered?
  • Doubled up IBA+EXP annotations (from Karen Christie)
  • Issue with GOC inference file (i) incorrect aspect reported
  • Proposal:

New topics

Representing complete proteomes in GO (added by Val)

Datasets (obtaining/maintaining complete datasets with unique identifiers)

Overview. Nobody seems really sure what happens. I'll document what I think happens here and then run it by others to confirm

1. GO uses https://www.ebi.ac.uk/reference_proteomes to define the set of human IDs uniquely. This is also used by Panther. This reference proteome set represents each HGNC ID uniquely. This causes issues when 2 proteins are encoded by the a loci described by a single HGNC name.

2. UniProt has other versions of reference proteome (I asked UniProt helpdesk about this).

Questions

How do changes in the reference proteome in-between releases affect GO? i.e What happens to new or revised IDs if they are used in GO annotations, but are not represented in the reference proteome?


Annotation

  • Which organisms other than cerevisiae and pombe have looked at all protein coding genes for the availability/possibility of GO annotation? Establish the difference between:
  • (1) "not in the GO database (not found);
  • (2) "unknown" (ND),
  • (3) "unannotated" (no ND, and no annotation in Aspect of interest)

(difference can be established using the complete known protein ID set for your organism and GO term mapper https://go.princeton.edu/cgi-bin/GOTermMapper)

Breakout sessions topics

  1.  Guidelines for submitting annotations to GO - for example Ivan Erill also had an idea to ask the organizers of the Phage Meeting to provide an option for abstract submissions to include author-generated GO annotations. What would our guidelines be ?
  2. GO Slims - review Alliance slim with latest stats from Mary Dolan. Does the goslim_agr need any updates?
  3. 'Response to' workshop (similar to the signaling WS)
  4. Use cases: Should we add this to the agenda? What would be a productive way of discussing this topic? https://docs.google.com/document/d/104m4jUNjPH9pCpskg8E29Zm2pLHhPFmKyIFLa9l_EOQ/edit

Product owners/tech leads discussion - lunch Thursday

  • Small debrief session: Lessons learned, suggestions for improvements, next face-2-face meeting