Final GPAD and GPI file format: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
m (Created page with " ===gp_association files (GPAD)=== <pre> N.B. The first line in the gp_association file should be; !gpa-version: 1.1 </pre> ====Final format (09 Jan 2013)==== {| style="bac...")
 
mNo edit summary
Line 1: Line 1:
===gp_association files (GPAD)===
===gp_association files (GPAD)===


Line 12: Line 11:




{| style="background:#ccffff" border=1 cell-padding=5 cell-spacing=10
{| border=1 cell-padding=5 cell-spacing=10
|-
|-
! column
! column

Revision as of 07:16, 9 January 2013

gp_association files (GPAD)

N.B. The first line in the gp_association file should be;

!gpa-version: 1.1


Final format (09 Jan 2013)

column name required? cardinality old column # extra info
1 DB required 1 1 must be in xrf_abbs
2 DB_Object_ID required 1 2 canonical or spliceform ID
3 Qualifier required 0 or greater 4 qualifiers to be confirmed
4 GO ID required 1 5 must be extant GO ID
5 DB:Reference(s) required 1 or greater 6 DB must be in xrf_abbs
6 Evidence code required 1 7 from ECO
7 With (or) From optional 0 or greater 8
8 Interacting taxon ID (for multi-organism processes) optional 0 or 1 13 NCBI taxon ID
9 Date required 1 14 YYYYMMDD
10 Assigned_by required 1 15 from xrf_abbs
11 Annotation Extension optional 0 or greater 16
12 Annotation Properties optional 0 or greater See Note 1 below

Notes

1. The Annotation Properties column can be filled with a pipe separated list of "property_name = property_value". There will be a fixed vocabulary for the property names and this list can be extended when necessary. The initial supported properties would be curator_name and annotation_identifier*, but can be extended to include e.g. curator_ID, modification_date, creation_date, annotation_notes...etc.

* curator_name and annotation_identifier will be useful for groups that are using Protein2GO for protein annotation who wish to maintain their annotations in their own database. These values can be used to keep track of individual annotations.

gp_information files (GPI)

N.B. The first two lines in the gp_information file should be;

!gpi-version: 1.1

!namespace: <database>

There should be a header line specifying the namespace of the annotating groups' identifiers, e.g. WB:, UniProtKB:

Final format (09 Jan 2013)

column name required? cardinality GAF column Example for UniProt Example for WormBase
01 DB_Object_ID required 1 2/17 Q4VCS5-1 WBGene00000035
02 DB_Object_Symbol required 1 3 AMOT ace-1
03 DB_Object_Name optional 0 or 1 10 Angiomotin
04 DB_Object_Synonym(s) optional 0 or greater 11 KIAA1071|AMOT ACE1
05 DB_Object_Type required 1 12 protein gene
06 Taxon required 1 13 taxon:9606 taxon:6239
07 Annotation_Completed optional 1 - 20120614 (YYYYMMDD) 20100405
08 Parent_Object_ID optional 0 or 1 - Q4VCS5 WBGene00000035
09 DB_Xref(s) optional 0 or greater - - UniProtKB:P38433


Notes

1. Where it is stated that a column can have one or greater values, e.g. 'with', DB_Object_Synonym(s), DB_Xref(s), the values should be given as a pipe-separated list.


2. The DB_Xrefs column will be useful for mapping of MOD-specific identifiers/symbols/synonyms to UniProt accessions to assist MOD curators moving to Protein2GO in searching for familiar IDs/gene names.