Identifiers: Difference between revisions
(→MGI) |
|||
Line 53: | Line 53: | ||
=== MGI === | === MGI === | ||
MGI | MGI IDs are a major problem | ||
GAF (cols 1-3): | GAF (cols 1-3): | ||
Line 59: | Line 59: | ||
MGI MGI:98297 Shh | MGI MGI:98297 Shh | ||
Using the concatenation rule, this | Using the concatenation rule, this composez to the global ID | ||
* MGI:MGI:98297 | * MGI:MGI:98297 | ||
Here we have a doubling up of the MGI prefix. Note that this is inconsistent with NCBI xrefs, which use MGI:98297 - inconsistency | Here we have a doubling up of the MGI prefix. Note that this is inconsistent with NCBI xrefs, which use MGI:98297 - '''this inconsistency has yet to be resolved with NCBI''' | ||
Note also that typically people used the local ID in their WITH columns; eg rat has: | Note also that typically people used the local ID in their WITH columns; eg rat has: | ||
Line 81: | Line 79: | ||
* ZFIN:ZDB-GENE-980526-166 | * ZFIN:ZDB-GENE-980526-166 | ||
This is identical to what NCBI uses in their xref | This is identical to what NCBI uses in their xref | ||
MGI has confirmed that the global ID is MGI:MGI:nnnnn, and the local internal ID is MGI:nnnn | |||
=== RGD === | === RGD === |
Revision as of 17:13, 22 September 2008
All identifiers in GO should be composed from a binary key as follows:
GlobalID = Database ':' LocalID
The LocalID scheme is under the control of the Database. It should include no whitespace or non-ascii characters.
Examples of well behaved IDs:
- GO:0008152
- Database=GO LocalID=0008152
- SGD:S000006435
- Database=SGD LocalID=S000006435
- ZFIN:ZDB-GENE-980526-166
- Database=ZFIN LocalID=ZDB-GENE-980526-166
In the gene association files, Database goes in column 1, LocalID goes in column 2
For filling in the WITH column, the Global ID should be used. This has to be the case, otherwise it would be difficult to tell where the ID came from
The Database should be registered in GO.xrf_abbs, available here:
Spec:
Also available in .obo format here:
It is strongly recommended that the primary abbreviation is always used in constructing the ID. However, in some contexts it is allowable to use a database abbreviation synonym.
For example, the following is allowed:
- UniProt:Q09212
However, the following is strongly preferred:
- UniProtKB:Q09212
Problems with existing usage
FLYBASE vs FB
We have both FB and FlyBase registered here Also in the fb gene_association files, the col1 is FB but the assigned_by column is FlyBase. NCBI seem to use FLYBASE
Josh has been alerted, bringing this up with FlyBase
MGI
MGI IDs are a major problem
GAF (cols 1-3):
MGI MGI:98297 Shh
Using the concatenation rule, this composez to the global ID
- MGI:MGI:98297
Here we have a doubling up of the MGI prefix. Note that this is inconsistent with NCBI xrefs, which use MGI:98297 - this inconsistency has yet to be resolved with NCBI
Note also that typically people used the local ID in their WITH columns; eg rat has:
RGD RGD:3673 Shh GO:0001525 RGD:1580654 ISS MGI:98297 P sonic hedgehog homolog (Drosophila) gene taxon:10116 20060820 RGD
(NOTE: RGD have fixed this. Thanks RGD!)
Compare with the (well-behaved) ZFIN GAFs and IDs:
ZFIN ZDB-GENE-980526-166 shha
col1:col2 =
- ZFIN:ZDB-GENE-980526-166
This is identical to what NCBI uses in their xref
MGI has confirmed that the global ID is MGI:MGI:nnnnn, and the local internal ID is MGI:nnnn
RGD
RGD previously used the same pattern as MGI. As of 2008/06/23 they have confirmed their policy and fixed their files. RGD:nnnn is the global ID. The local ID is purely a number (for both genes and references)
recommendation
MGI and RGD should either
- change their col2 in their GAFs such that only the number is used
- coordinate with other databases, including NCBI to make it clear that the global ID is MGI:MGI:nnnnn