Gene Product Association Data (GPAD) Format (Archived): Difference between revisions
Line 300: | Line 300: | ||
| style="color:blue" | UniProtKB || style="color:blue" | Q4VCS5 || style="background:white;color:blue" | AMOT_HUMAN || || GO:0043532 || PMID:11257124 || IDA || || F ||style="background:white;color:blue" | AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" | IPI00163085 ||style="background:white;color:blue" | protein ||style="color:blue" | taxon:9606 || 20051207 || UniProtKB || ||style="background:white;color:blue" | | | style="color:blue" | UniProtKB || style="color:blue" | Q4VCS5 || style="background:white;color:blue" | AMOT_HUMAN || || GO:0043532 || PMID:11257124 || IDA || || F ||style="background:white;color:blue" | AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" | IPI00163085 ||style="background:white;color:blue" | protein ||style="color:blue" | taxon:9606 || 20051207 || UniProtKB || ||style="background:white;color:blue" | | ||
|- style="background:#ccffff" | |- style="background:#ccffff" | ||
| style="color:blue" | UniProtKB || style="color:blue" | Q4VCS5 | | style="color:blue" | UniProtKB || style="color:blue" | Q4VCS5 || style="background:white;color:blue" | AMOT_HUMAN || || GO:0043116 || PMID:16043488 || IDA || || P ||style="background:white;color:blue" | AMOT, KIAA1071:Angiomotin ||style="background:white;color:blue" | IPI00163085 ||style="background:white;color:blue" | snoRNA ||style="color:blue" | taxon:9606 || 20051207 || UniProtKB || ||style="background:white;color:blue" | Q4VCS5-1 | ||
|- style="background:#ccffff" | |- style="background:#ccffff" | ||
| style="color:blue" | UniProtKB || style="color:blue" | Q4VCS5 | | style="color:blue" | UniProtKB || style="color:blue" | Q4VCS5 || style="background:white;color:blue" | AMOT_HUMAN || || GO:0005515 || PMID:16043488 || IPI || UniProtKB:Q6RHR9-2 || F ||style="background:white;color:blue" | AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" | IPI00163085 ||style="background:white;color:blue" | snoRNA ||style="color:blue" | taxon:9606 || 20051207 || UniProtKB || ||style="background:white;color:blue" | Q4VCS5-1 | ||
|- style="background:#ccffff" | |- style="background:#ccffff" | ||
| style="color:blue" | UniProtKB || style="color:blue" | Q4VCS5 | | style="color:blue" | UniProtKB || style="color:blue" | Q4VCS5 || style="background:white;color:blue" | AMOT_HUMAN || || GO:0043532 || PMID:16043488 || IDA || || F ||style="background:white;color:blue" | AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" | IPI00163085 ||style="background:white;color:blue" | protein ||style="color:blue" | taxon:9606 || 20051207 || UniProtKB || ||style="background:white;color:blue" | Q4VCS5-2 | ||
|} | |} | ||
Revision as of 12:00, 19 October 2009
Proposal to split the information in the GAF files into two sets, association data and gene product data.
Current Association File Format
column | required? | contents | cardinality |
---|---|---|---|
1 | required | DB | 1 |
2 | required | DB_Object_ID | 1 |
3 | required | DB_Object_Symbol | 1 |
4 | optional | Qualifier | 0 or greater |
5 | required | GO ID | 1 |
6 | required | DB:Reference(s) | 1 or greater |
7 | required | Evidence code | 1 |
8 | optional | With (or) From | 0 or greater |
9 | required | Aspect | 1 |
10 | optional | DB_Object_Name | 0 or 1 |
11 | optional | DB_Object_Synonym(s) | 0 or greater |
12 | required | DB_Object_Type (refers to col 17 if present) | 1 |
13 | required | taxon | 1 or 2 (for multi-org processes) |
14 | required | Date | 1 |
15 | required | Assigned_by | 1 |
16 | optional | Annotation cross products | ? |
17 | optional | Spliceform | 1 |
Proposal: remove gene product data from the association file, leaving just an identifier.
new format for storing annotations:
old column # | required? | contents | cardinality |
---|---|---|---|
1 | required | DB | 1 |
17 if present; else 2 | required | Spliceform ID OR DB_Object_ID | 1 |
4 | optional | Qualifier | 0 or greater |
5 | required | GO ID | 1 |
6 | required | DB:Reference(s) | 1 or greater |
7 | required | Evidence code | 1 |
8 | optional | With (or) From | 0 or greater |
9 | required | Aspect | 1 |
13 | optional | Interacting taxon ID (for multi-organism processes) | 0 or 1 |
14 | required | Date | 1 |
15 | required | Assigned_by | 1 |
16 | optional | Annotation cross products | ? |
Gene product data would be stored in a separate file. It would consist of the following pieces of information:
old column # | required? | contents | cardinality |
---|---|---|---|
1 | required | DB | 1 |
2 | required | DB_Object_ID | 1 |
3 | required | DB_Object_Symbol | 1 |
10 | optional | DB_Object_Name | 0 or 1 |
11 | optional | DB_Object_Synonym(s) | 0 or greater |
12 | required | DB_Object_Type | 1 |
13 | required | taxon | 1 |
n/a | required | UniProt xref (from gp2 protein file) | 1 |
Any GPs with different spliceforms would also have the following data:
old column # | required? | contents | cardinality |
---|---|---|---|
17 | required | Spliceform ID | 1 |
12 | required | Spliceform object type | 1 |
Ideally, the gene product files would also include the gp2protein data, so we would have an additional piece of data, an xref to a UniProt or NCBI ID.
Example
The following appears on the page http://geneontology.org/GO.annotation.fields.shtml and is an example of the current GAF file structure (shaded bg: annotation data; blue text: gp data):
1
DB |
2
DB Object ID |
3
DB Object Symbol |
4
Qualifier |
5
GO ID |
6
DB:Reference(s) |
7
Evidence code |
8
With (or) From |
9
Aspect |
10
DB Object Name |
11
DB Object Synonym(s) |
12
DB Object Type (refers to col 17 if present) |
13
taxon |
14
Date |
15
Assigned by |
16
Annotation cross products |
17
Spliceform |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SGD | S000000296 | PHO3 | GO:0003993 | SGD_REF:S000047763 | IMP | F | acid phosphatase | YBR092C | gene | taxon:4932 | 20010118 | SGD | ||||
SGD | S000000296 | PHO3 | GO:0006796 | SGD_REF:S000047115 | TAS | P | acid phosphatase | YBR092C | gene | taxon:4932 | 20041220 | SGD | ||||
SGD | S000005370 | RCL1 | NOT | GO:0003963 | SGD_REF:S000039255 | IDA | F | YOL010W | gene | taxon:4932 | 20020530 | SGD | ||||
SGD | S000005197 | TEX1 | GO:0006406 | SGD_REF:S000069956 | IC | GO:0000346 | P | YNL253W | gene | taxon:4932|taxon:745953 | 20030221 | SGD | ||||
SGD | S000005316 | ABZ1 | GO:0046820 | SGD_REF:S000057703 | ISS | CGSC:pabA | F | aminodeoxychorismate synthase | YNR033W | gene | taxon:4932|taxon:2861 | 20030106 | SGD | |||
UniProtKB | Q4VCS5 | AMOT_HUMAN | GO:0031410 | PMID:11257124 | IDA | C | AMOT, KIAA1071: Angiomotin | IPI00163085 | protein | taxon:9606 | 20051207 | UniProtKB | ||||
UniProtKB | Q4VCS5 | AMOT_HUMAN | GO:0043532 | PMID:11257124 | IDA | F | AMOT, KIAA1071: Angiomotin | IPI00163085 | protein | taxon:9606 | 20051207 | UniProtKB | ||||
UniProtKB | Q4VCS5 | AMOT_HUMAN | GO:0043116 | PMID:16043488 | IDA | P | AMOT, KIAA1071:Angiomotin | IPI00163085 | snoRNA | taxon:9606 | 20051207 | UniProtKB | Q4VCS5-1 | |||
UniProtKB | Q4VCS5 | AMOT_HUMAN | GO:0005515 | PMID:16043488 | IPI | UniProtKB:Q6RHR9-2 | F | AMOT, KIAA1071: Angiomotin | IPI00163085 | snoRNA | taxon:9606 | 20051207 | UniProtKB | Q4VCS5-1 | ||
UniProtKB | Q4VCS5 | AMOT_HUMAN | GO:0043532 | PMID:16043488 | IDA | F | AMOT, KIAA1071: Angiomotin | IPI00163085 | protein | taxon:9606 | 20051207 | UniProtKB | Q4VCS5-2 |
This is how it could look in the proposed new format.
Association data:
1
DB |
17 or 2
Spliceform ID OR DB Object ID |
4
Qualifier |
5
GO ID |
6
DB:Reference(s) |
7
Evidence code |
8
With (or) From |
9
Aspect |
13
Interacting taxon ID (for multi-organism processes) |
14
Date |
15
Assigned_by |
16
Annotation cross products |
---|---|---|---|---|---|---|---|---|---|---|---|
SGD | S000000296 | GO:0003993 | SGD_REF:S000047763 | IMP | F | 20010118 | SGD | ||||
SGD | S000000296 | GO:0006796 | SGD_REF:S000047115 | TAS | P | 20041220 | SGD | ||||
SGD | S000005370 | NOT | GO:0003963 | SGD_REF:S000039255 | IDA | F | 20020530 | SGD | |||
SGD | S000005197 | GO:0006406 | SGD_REF:S000069956 | IC | GO:0000346 | P | taxon:745953 | 20030221 | SGD | ||
SGD | S000005316 | GO:0046820 | SGD_REF:S000057703 | ISS | CGSC:pabA | F | taxon:2861 | 20030106 | SGD | ||
UniProtKB | Q4VCS5 | GO:0031410 | PMID:11257124 | IDA | C | 20051207 | UniProtKB | ||||
UniProtKB | Q4VCS5 | GO:0043532 | PMID:11257124 | IDA | F | 20051207 | UniProtKB | ||||
UniProtKB | Q4VCS5-1 | GO:0043116 | PMID:16043488 | IDA | P | 20051207 | UniProtKB | ||||
UniProtKB | Q4VCS5-1 | GO:0005515 | PMID:16043488 | IPI | UniProtKB:Q6RHR9-2 | F | 20051207 | UniProtKB | |||
UniProtKB | Q4VCS5-2 | GO:0043532 | PMID:16043488 | IDA | F | 20051207 | UniProtKB |
GP data:
1
DB |
2
DB_Object_ID |
3
DB_Object_Symbol |
10
DB_Object_Name |
11
DB_Object_Synonym(s) |
12
DB_Object_Type |
13
taxon |
n/a
UniProt xref (from gp2protein file) |
n/a
Spliceform ID, spliceform type |
---|---|---|---|---|---|---|---|---|
SGD | S000000296 | PHO3 | acid phosphatase | YBR092C | gene | taxon:4932 | UniProt:NE92D8 | |
SGD | S000005370 | RCL1 | YOL010W | gene | taxon:4932 | UniProt:JN97D8 | ||
SGD | S000005197 | TEX1 | YNL253W | gene | taxon:4932 | UniProt:F9NO8X | ||
SGD | S000005316 | ABZ1 | aminodeoxychorismate synthase | YNR033W | gene | taxon:4932 | UniProt:C2BF93 | |
UniProtKB | Q4VCS5 | AMOT_HUMAN | AMOT, KIAA1071: Angiomotin | IPI00163085 | protein | taxon:9606 | UniProt:Q4VCS5 | Q4VCS5-1, snoRNA | Q4VCS5-2, protein |
The representation of the spliceforms could be changed if it isn't clear enough.