Manager Call 2015-12-16
1. Acceptable Gene Identifiers
Is this the rule?
- IDs used to represent genes:
- MOD gene identifiers (MGI:MGI:, WB:, ZFIN:ZDB-GENE-, TAIR:locus: etc)
- Generic UniprotKB Ids (UniProtKB:)
- ENSEMBL gene IDs (Ensembl:)
- NCBI gene IDs (NCBI_gene:)
- RNA central IDs (RNAcentral:)
- HGNC IDs (HGNC:)
This is what is loaded in the Noctua-Entity-Ontology
- MOD IDs (MGI:MGI:, FBL, etc)
- We strictly follow the go-site xref metadata https://github.com/geneontology/go-site/blob/master/metadata/db-xrefs.yaml (browsable in amigo - http://amigo.geneontology.org/xrefs
- **WE WILL BE CHANGING MGI** - we will drop the additional MGI: as this causes massive problems
- The MOD ID takes precedence over the UniProtKB ID in Noctua, but based on feedback from Geneva we will allow curators to enter the UniProtKB ID, but this will still be resolved as a MOD ID if there is a MOD for the product
- In NEO, we formally commit to the entity being uncommitted w.r.t gene vs product. Ie we use MOD IDs and UniProtKBs interchangeably
- ENSEMBL IDs: not currently loaded -- but we may allow these for search
- NCBIGene: ditto
- HGNC: ditto
- RNAcentral: will be loaded soon
The BIG question is: Which gene IDs are valid for the GOC?
- There is a motion to make a final decision so that curators may use whichever one they want, choosing from a list of valid ID options.
- This will be the list of valid IDs that the GOC will recognize.
The consensus response we have reached so far is:
- We will use MOD gene IDs and UniProtKB IDs to represent gene and gene products, and we will not add semantics into that.
- Both sets of IDs are consistent with what we do on Column 2.
- We WILL NOT use Ensembl gene IDs, NCBI gene IDs, or HGNC IDs.
Special notes: RNA-Central ID will also be ok. For human genes: we will use UniProt IDs.
MGI ID update:
- Prefix is MGI, then colon, then MGI number. E.g. MGI:xxxxx Note: I don't know the number of integers after the colon.
- Resolving the issue of incorporating the new MGI ID will take a lot of coordination, as all IDs have to change at the same time.
During 'meeting of the MODs', NHGRI put forward the following initiative: The common API needs standard IDs with which to work. IDs have to be a component of that. ACTION: Judy will bring Chris into this conversation.
- 1) Is a gene ID, correct? Or is UniProtKB a protein ID?
Answer: we are using MOD IDs and UniProtKB IDs to mean the union of the gene and the product. For humans, there is no MOD ID.
- 2) Why are we removing NCBI, Ensembl, and HGNC?
Answer: Because we want one canonical ID for everything. To ensure we do the same in Column 16 as we did in Column 2. We are standardizing on a restricting space of IDs to use.
- Judy expressed that there is a case to be made for HGNC IDs.
- Paul: we could consider it as the equivalent of the MOD ID for human. But why would we want to use it at this point if we don't already use it in Column 2?
- Judy: Until now, the human curation has come out of UniProt. If mouse were doing human and mouse curation, we would be requesting HGNC IDs.
- Paul: it's ok to extend that to one more identifier space. Let us leave it on the table for further discussion.
- When MGI (and also WormBase) receives a GAF with Ensembl or NCBI IDS, they translate the MGI IDs and output MGI. MGI are not doing that for Column 16 UniProt IDs; WB are. MGI does translate Column 2 IDs into MGI IDs.
- Paul: we are proposing to be consistent between Col2 and Col16.
- DavidH: we still have generic UniProt IDs in Col16 and we will allow these for mouse genes at this time.
- Judy: in COl 16 as we get annotations to particular isoforms, we use UniProtIDs-ProIDs.
- DavidH: in Geneva the consensus was that we want to have proteoform annotations.
- Chris: There are constrains dictated at the moment of entering information about isoform. You annotate at the gene level unless you know specifically that it is an isoform, then you use isoform / proteoform ID.
- Curators may use MOD ID or UniProt generic ID to represent a gene
- We now have a Google document and get the discussion finalized there. The document can be found on the GO Drive, under the GO Annotation Directory at https://goo.gl/JxHiUN
2. LEGO Meeting Report
- LEGO curation docs: https://goo.gl/olzAUL
- Videos available here
- Held in Geneva, at SIB.
- Strategy: Paul gave a talk, then curators went on to curate a paper using the tool and later brainstormed about what they wanted to see. Seth made it all happen very quickly!
- Praise from the attendees:
- David Hill: Tried to write biologist interpretation of each of the relations - how to use the relations in the Noctua world. Reviewing before making it public.
- David Hill: We can take single annotations and link them together to tell a biological story. We can export into GAF files, hopefully also into GPAD files. Now we can import existent annotations into Noctua. This is the way of the future!
- Kimberly: LEGO makes GO annotation a lot easier for people. It lessens the information stage stops people need to make. Simplifies the process and makes the annotation more accurate.
- If you go through the curation documentation, you can see it. No longer needed to choose the longest term first to reach the level of detail you'd like.
- Proposal to do a live presentation during an annotation call; there are also videos available here.
- Curators at the meeting gave very positive feedback; they were very happy with it. Their models are available on the Noctua site.
- Notes for the meeting are available at https://goo.gl/mVOcyI