Proposed GPI1.2 format: Difference between revisions
Line 21: | Line 21: | ||
! Example for UniProt | ! Example for UniProt | ||
! Example for IntAct | ! Example for IntAct | ||
! Example for MGI Gene | |||
! MGI protein | |||
! MGI RNA | |||
|- | |- | ||
| 01 || DB || required || 1 || 1 || UniProtKB || IntAct | | 01 || DB || required || 1 || 1 || UniProtKB || IntAct || MGI || PR ||ENSEMBL | ||
|- | |- | ||
| 02 || DB_Object_ID || required || 1 || 2/17 || Q4VCS5-1 || EBI-9008420 | | 02 || DB_Object_ID || required || 1 || 2/17 || Q4VCS5-1 || EBI-9008420 || MGI:96175 || Q9Z172-1 || ENSMUST00000127454 | ||
|- | |- | ||
| 03 || DB_Object_Symbol || required || 1 || 3 || AMOT || HBA1:HBB | | 03 || DB_Object_Symbol || required || 1 || 3 || AMOT || HBA1:HBB || Hoxa3 || mSumo3/iso:m1 ||ENSMUST00000127454 | ||
|- | |- | ||
| 04 || DB_Object_Name || optional || 0 or 1 || 10 || Angiomotin || Hemoglobin HbA complex | | 04 || DB_Object_Name || optional || 0 or 1 || 10 || Angiomotin || Hemoglobin HbA complex || homeobox A3 || small ubiquitin-related modifier 3 isoform m1 (mouse) || | ||
|- | |- | ||
| 05 || DB_Object_Synonym(s) || optional || 0 or greater || 11 || AMOT_HUMAN|KIAA1071|AMOT || HBA-HBB complex|HBA1-HBB complex|HBA1-HBB heterotetramer | | 05 || DB_Object_Synonym(s) || optional || 0 or greater || 11 || AMOT_HUMAN|KIAA1071|AMOT || HBA-HBB complex|HBA1-HBB complex|HBA1-HBB heterotetramer || Hox-1.5|Mo-10 || || | ||
|- | |- | ||
| 06 || DB_Object_Type || required || 1 || 12 || protein || complex | | 06 || DB_Object_Type || required || 1 || 12 || protein || complex || gene || protein || transcript | ||
|- | |- | ||
| 07 || Taxon || required || 1 || 13 || 9606 || 9606 | | 07 || Taxon || required || 1 || 13 || 9606 || 9606 || taxon:10090 || taxon:10090 || taxon:10090 | ||
|- | |- | ||
| 08 || Parent_Object_ID || optional || 0 or 1 || || UniProtKB:Q4VCS5 || | | 08 || Parent_Object_ID || optional || 0 or 1 || || UniProtKB:Q4VCS5 || || || MGI:MGI:1336201 || MGI:MGI:1098592 | ||
|- | |- | ||
| 09 || DB_Xref(s) || optional || 0 or greater || || UniProtKB:P38433 || PR:000025934 | | 09 || DB_Xref(s) || optional || 0 or greater || || UniProtKB:P38433 || PR:000025934 || UniProtKB:P02831 || UniProtKB:Q9Z172-1 || | ||
|- | |- | ||
| 010 || Gene_Product_Properties || optional || 0 or greater || || See Note 4 below || | | 010 || Gene_Product_Properties || optional || 0 or greater || || See Note 4 below || || || || | ||
|- | |- | ||
|} | |} |
Revision as of 09:00, 29 September 2016
gp_information files (GPI)
N.B. The first line in the gp_information file should be; !gpi-version: 1.2
Proposed format (March 2014)
column | name | required? | cardinality | GAF column | Example for UniProt | Example for IntAct | Example for MGI Gene | MGI protein | MGI RNA |
---|---|---|---|---|---|---|---|---|---|
01 | DB | required | 1 | 1 | UniProtKB | IntAct | MGI | PR | ENSEMBL |
02 | DB_Object_ID | required | 1 | 2/17 | Q4VCS5-1 | EBI-9008420 | MGI:96175 | Q9Z172-1 | ENSMUST00000127454 |
03 | DB_Object_Symbol | required | 1 | 3 | AMOT | HBA1:HBB | Hoxa3 | mSumo3/iso:m1 | ENSMUST00000127454 |
04 | DB_Object_Name | optional | 0 or 1 | 10 | Angiomotin | Hemoglobin HbA complex | homeobox A3 | small ubiquitin-related modifier 3 isoform m1 (mouse) | |
05 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | KIAA1071|AMOT | HBA1-HBB complex|HBA1-HBB heterotetramer | Mo-10 | ||
06 | DB_Object_Type | required | 1 | 12 | protein | complex | gene | protein | transcript |
07 | Taxon | required | 1 | 13 | 9606 | 9606 | taxon:10090 | taxon:10090 | taxon:10090 |
08 | Parent_Object_ID | optional | 0 or 1 | UniProtKB:Q4VCS5 | MGI:MGI:1336201 | MGI:MGI:1098592 | |||
09 | DB_Xref(s) | optional | 0 or greater | UniProtKB:P38433 | PR:000025934 | UniProtKB:P02831 | UniProtKB:Q9Z172-1 | ||
010 | Gene_Product_Properties | optional | 0 or greater | See Note 4 below |
Notes
1. Where it is stated that a column can have one or greater values, e.g. 'with', DB_Object_Synonym(s), DB_Xref(s), the values should be given as a pipe-separated list.
2. The DB_Xrefs column will be useful for mapping of MOD-specific identifiers/symbols/synonyms to UniProt accessions to assist MOD curators moving to Protein2GO in searching for familiar IDs/gene names. In the case of IntAct complexe IDs, it will be useful to include PRO IDs as an xref to enable a look-up function in Protein2GO. In the case where the value in column #2 represents a MOD gene identifier, the Xref should correspond to the UniProtKB identifier for the GCRP.
3. Identifiers in the Parent_Object_ID column must have a prefix to avoid confusion in cases where an ID from a different database to the one specified in the header is included
4. The Gene Product Properties column can be filled with a pipe separated list of "property_name = property_value". There will be a fixed vocabulary for the property names and this list can be extended when necessary. Supported properties will include: 'GO annotation complete', "Phenotype annotation complete' (the value for these two properties would be a date), 'Target set' (e.g. Reference Genome, Kidney etc.), 'Database subset' (e.g. Swiss-Prot, TrEMBL), go_annotation_summary (textual summary of annotations for an entity)