Proposed GPI1.2 format: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 51: Line 51:




2. The DB_Xrefs column will be useful for mapping of MOD-specific identifiers/symbols/synonyms to UniProt accessions to assist MOD curators moving to Protein2GO in searching for familiar IDs/gene names. In the case of IntAct complexe IDs, it will be useful to include PRO IDs as an xref to enable a look-up function in Protein2GO.
2. The DB_Xrefs column will be useful for mapping of MOD-specific identifiers/symbols/synonyms to UniProt accessions to assist MOD curators moving to Protein2GO in searching for familiar IDs/gene names. In the case of IntAct complexe IDs, it will be useful to include PRO IDs as an xref to enable a look-up function in Protein2GO. In the case where the value in column #2 represents a MOD gene identifier, the Xref should correspond to the UniProtKB identifier for the GCRP.


3. Identifiers in the Parent_Object_ID column must have a prefix to avoid confusion in cases where an ID from a different database to the one specified in the header is included
3. Identifiers in the Parent_Object_ID column must have a prefix to avoid confusion in cases where an ID from a different database to the one specified in the header is included

Revision as of 15:53, 21 July 2016

gp_information files (GPI)

N.B. The first line in the gp_information file should be;

!gpi-version: 1.2

Proposed format (March 2014)

column name required? cardinality GAF column Example for UniProt Example for IntAct
01 DB required 1 1 UniProtKB IntAct
02 DB_Object_ID required 1 2/17 Q4VCS5-1 EBI-9008420
03 DB_Object_Symbol required 1 3 AMOT HBA1:HBB
04 DB_Object_Name optional 0 or 1 10 Angiomotin Hemoglobin HbA complex
05 DB_Object_Synonym(s) optional 0 or greater 11 KIAA1071|AMOT HBA1-HBB complex|HBA1-HBB heterotetramer
06 DB_Object_Type required 1 12 protein complex
07 Taxon required 1 13 9606 9606
08 Parent_Object_ID optional 0 or 1 UniProtKB:Q4VCS5
09 DB_Xref(s) optional 0 or greater UniProtKB:P38433 PR:000025934
010 Gene_Product_Properties optional 0 or greater See Note 4 below


Notes

1. Where it is stated that a column can have one or greater values, e.g. 'with', DB_Object_Synonym(s), DB_Xref(s), the values should be given as a pipe-separated list.


2. The DB_Xrefs column will be useful for mapping of MOD-specific identifiers/symbols/synonyms to UniProt accessions to assist MOD curators moving to Protein2GO in searching for familiar IDs/gene names. In the case of IntAct complexe IDs, it will be useful to include PRO IDs as an xref to enable a look-up function in Protein2GO. In the case where the value in column #2 represents a MOD gene identifier, the Xref should correspond to the UniProtKB identifier for the GCRP.

3. Identifiers in the Parent_Object_ID column must have a prefix to avoid confusion in cases where an ID from a different database to the one specified in the header is included

4. The Gene Product Properties column can be filled with a pipe separated list of "property_name = property_value". There will be a fixed vocabulary for the property names and this list can be extended when necessary. Supported properties will include: 'GO annotation complete', "Phenotype annotation complete' (the value for these two properties would be a date), 'Target set' (e.g. Reference Genome, Kidney etc.), 'Database subset' (e.g. Swiss-Prot, TrEMBL), go_annotation_summary (textual summary of annotations for an entity)