Gene Product Association Data (GPAD) Format (Archived)
Proposal to split the information in the GAF files into two sets, association data and gene product data.
Current Association File Format
column | required? | contents | cardinality |
---|---|---|---|
1 | required | DB | 1 |
2 | required | DB_Object_ID | 1 |
3 | required | DB_Object_Symbol | 1 |
4 | optional | Qualifier | 0 or greater |
5 | required | GO ID | 1 |
6 | required | DB:Reference(s) | 1 or greater |
7 | required | Evidence code | 1 |
8 | optional | With (or) From | 0 or greater |
9 | required | Aspect | 1 |
10 | optional | DB_Object_Name | 0 or 1 |
11 | optional | DB_Object_Synonym(s) | 0 or greater |
12 | required | DB_Object_Type (refers to col 17 if present) | 1 |
13 | required | taxon | 1 or 2 (for multi-org processes) |
14 | required | Date | 1 |
15 | required | Assigned_by | 1 |
16 | optional | Annotation cross products | ? |
17 | optional | Spliceform | 1 |
Proposal: remove gene product data from the association file, leaving just an identifier.
new format:
old column # | required? | contents | cardinality |
---|---|---|---|
1 | required | DB | 1 |
17 if present; else 2 | required | Spliceform ID OR DB_Object_ID | 1 |
4 | optional | Qualifier | 0 or greater |
5 | required | GO ID | 1 |
6 | required | DB:Reference(s) | 1 or greater |
7 | required | Evidence code | 1 |
8 | optional | With (or) From | 0 or greater |
14 | required | Date | 1 |
15 | required | Assigned_by | 1 |
16 | optional | Annotation cross products | ? |
13 | optional | Interacting taxon ID (for multi-organism processes) | 0 or 1 |
Gene product data would be stored in a separate file. It would consist of the following pieces of information:
old column # | required? | contents | cardinality |
---|---|---|---|
1 | required | DB | 1 |
2 | required | DB_Object_ID | 1 |
3 | required | DB_Object_Symbol | 1 |
10 | optional | DB_Object_Name | 0 or 1 |
11 | optional | DB_Object_Synonym(s) | 0 or greater |
12 | required | DB_Object_Type | 1 |
13 | required | taxon | 1 |
Any GPs with different spliceforms would also have the following data:
old column # | required? | contents | cardinality |
---|---|---|---|
17 | required | Spliceform ID | 1 |
12 | required | Spliceform object type | 1 |
Ideally, the gene product files would also include the gp2protein data, so we would have an additional piece of data, an xref to a UniProt or NCBI ID.
Example
The following appears on the page http://geneontology.org/GO.annotation.fields.shtml and is an example of the current GAF file structure:
SGD S000000296 PHO3 GO:0003993 SGD_REF:S000047763|PMID:2407294 IMP F acid phosphatase YBR092C gene taxon:4932 20010118 SGD SGD S000000296 PHO3 GO:0006796 SGD_REF:S000047115|PMID:2407294 TAS P acid phosphatase YBR092C gene taxon:4932 20041220 SGD SGD S000005370 RCL1 NOT GO:0003963 SGD_REF:S000039255|PMID:10790377 IDA F YOL010W gene taxon:4932 20020530 SGD SGD S000005197 TEX1 GO:0006406 SGD_REF:S000069956|PMID:11979277 IC GO:0000346 P YNL253W gene taxon:4932 20030221 SGD SGD S000005316 ABZ1 GO:0046820 SGD_REF:S000057703|PMID:8346682 ISS CGSC:pabA|CGSC:pabB F aminodeoxychorismate synthase YNR033W gene taxon:4932 20030106 SGD
1
DB |
2
DB Object ID |
3
DB Object Symbol |
4
Qualifier |
5
GO ID |
6
DB:Reference(s) |
7
Evidence code |
8
With (or) From |
9
Aspect |
10
DB Object Name |
11
DB Object Synonym(s) |
12
DB Object Type (refers to col 17 if present) |
13
taxon |
14
Date |
15
Assigned by |
16
Annotation cross products |
17
Spliceform |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SGD | S000000296 | PHO3 | GO:0003993 | SGD_REF:S000047763 | IMP | F | acid phosphatase | YBR092C | gene | taxon:4932 | 20010118 | SGD | ||||
SGD | S000000296 | PHO3 | GO:0006796 | SGD_REF:S000047115 | TAS | P | acid phosphatase | YBR092C | gene | taxon:4932 | 20041220 | SGD | ||||
SGD | S000005370 | RCL1 | NOT | GO:0003963 | SGD_REF:S000039255 | IDA | F | YOL010W | gene | taxon:4932 | 20020530 | SGD | ||||
SGD | S000005197 | TEX1 | GO:0006406 | SGD_REF:S000069956 | IC | GO:0000346 | P | YNL253W | gene | taxon:4932 | 20030221 | SGD | ||||
SGD | S000005316 | ABZ1 | GO:0046820 | SGD_REF:S000057703 | ISS | CGSC:pabA | F | aminodeoxychorismate synthase | YNR033W | gene | taxon:4932 | 20030106 | SGD |
This is how it could look in the proposed new format.
Association data:
SGD S000000296 GO:0003993 F SGD_REF:S000047763|PMID:2407294 IMP 20010118 SGD SGD S000000296 GO:0006796 P SGD_REF:S000047115|PMID:2407294 TAS 20041220 SGD SGD S000005370 NOT GO:0003963 F SGD_REF:S000039255|PMID:10790377 IDA 20020530 SGD SGD S000005197 GO:0006406 P SGD_REF:S000069956|PMID:11979277 IC GO:0000346 20030221 SGD SGD S000005316 GO:0046820 F SGD_REF:S000057703|PMID:8346682 ISS CGSC:pabA|CGSC:pabB 20030106 SGD
1
DB |
17 or 2
Spliceform ID OR DB_Object_ID |
4
Qualifier |
5
GO ID |
6
DB:Reference(s) |
7
Evidence code |
8
With (or) From |
14
Date |
15
Assigned_by |
16
Annotation cross products |
13
Interacting taxon ID (for multi-organism processes) | |
---|---|---|---|---|---|---|---|---|---|---|---|
SGD | S000000296 | GO:0003993 | F | SGD_REF:S000047763 | IMP | 20010118 | SGD | ||||
SGD | S000000296 | GO:0006796 | P | SGD_REF:S000047115 | TAS | 20041220 | SGD | ||||
SGD | S000005370 | NOT | GO:0003963 | F | SGD_REF:S000039255 | IDA | 20020530 | SGD | |||
SGD | S000005197 | GO:0006406 | P | SGD_REF:S000069956 | IC | GO:0000346 | 20030221 | SGD | |||
SGD | S000005316 | GO:0046820 | F | SGD_REF:S000057703 | ISS | CGSC:pabA | 20030106 | SGD |
GP data:
1
DB |
2
DB_Object_ID |
3
DB_Object_Symbol |
10
DB_Object_Name |
11
DB_Object_Synonym(s) |
12
DB_Object_Type |
13
taxon |
n/a
UniProt xref (from gp2protein file) |
---|---|---|---|---|---|---|---|
SGD | S000000296 | PHO3 | acid phosphatase | YBR092C | gene | taxon:4932 | UniProt:XXXXXX |
SGD | S000005370 | RCL1 | YOL010W | gene | taxon:4932 | UniProt:XXXXXX | |
SGD | S000005197 | TEX1 | YNL253W | gene | taxon:4932 | UniProt:XXXXXX | |
SGD | S000005316 | ABZ1 | aminodeoxychorismate synthase | YNR033W | gene | taxon:4932 | UniProt:XXXXXX |