Gene Product Association Data (GPAD) Format (Archived): Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 300: Line 300:
| style="color:blue" |  UniProtKB || style="color:blue" |  Q4VCS5 || style="background:white;color:blue" |  AMOT_HUMAN ||  ||  GO:0043532 ||  PMID:11257124 ||  IDA ||  ||  F ||style="background:white;color:blue" |  AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" |  IPI00163085 ||style="background:white;color:blue" |  protein ||style="color:blue" |  taxon:9606 ||  20051207 ||  UniProtKB ||  ||style="background:white;color:blue" |   
| style="color:blue" |  UniProtKB || style="color:blue" |  Q4VCS5 || style="background:white;color:blue" |  AMOT_HUMAN ||  ||  GO:0043532 ||  PMID:11257124 ||  IDA ||  ||  F ||style="background:white;color:blue" |  AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" |  IPI00163085 ||style="background:white;color:blue" |  protein ||style="color:blue" |  taxon:9606 ||  20051207 ||  UniProtKB ||  ||style="background:white;color:blue" |   
|- style="background:#ccffff"
|- style="background:#ccffff"
| style="color:blue" |  UniProtKB || style="color:blue" |  Q4VCS5-1 || style="background:white;color:blue" |  AMOT_HUMAN ||  ||  GO:0043116 ||  PMID:16043488 ||  IDA ||  ||  P ||style="background:white;color:blue" |  AMOT, KIAA1071:Angiomotin ||style="background:white;color:blue" |  IPI00163085 ||style="background:white;color:blue" |  snoRNA ||style="color:blue" |  taxon:9606 ||  20051207 ||  UniProtKB ||  ||style="background:white;color:blue" |   
| style="color:blue" |  UniProtKB || style="color:blue" |  Q4VCS5 || style="background:white;color:blue" |  AMOT_HUMAN ||  ||  GO:0043116 ||  PMID:16043488 ||  IDA ||  ||  P ||style="background:white;color:blue" |  AMOT, KIAA1071:Angiomotin ||style="background:white;color:blue" |  IPI00163085 ||style="background:white;color:blue" |  snoRNA ||style="color:blue" |  taxon:9606 ||  20051207 ||  UniProtKB ||  ||style="background:white;color:blue" |  Q4VCS5-1
|- style="background:#ccffff"
|- style="background:#ccffff"
| style="color:blue" |  UniProtKB || style="color:blue" |  Q4VCS5-1 || style="background:white;color:blue" |  AMOT_HUMAN ||  ||  GO:0005515 ||  PMID:16043488 ||  IPI ||  UniProtKB:Q6RHR9-2 ||  F ||style="background:white;color:blue" |  AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" |  IPI00163085 ||style="background:white;color:blue" |  snoRNA ||style="color:blue" |  taxon:9606 ||  20051207 ||  UniProtKB ||  ||style="background:white;color:blue" |   
| style="color:blue" |  UniProtKB || style="color:blue" |  Q4VCS5 || style="background:white;color:blue" |  AMOT_HUMAN ||  ||  GO:0005515 ||  PMID:16043488 ||  IPI ||  UniProtKB:Q6RHR9-2 ||  F ||style="background:white;color:blue" |  AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" |  IPI00163085 ||style="background:white;color:blue" |  snoRNA ||style="color:blue" |  taxon:9606 ||  20051207 ||  UniProtKB ||  ||style="background:white;color:blue" |  Q4VCS5-1
|- style="background:#ccffff"
|- style="background:#ccffff"
| style="color:blue" |  UniProtKB || style="color:blue" |  Q4VCS5-2 || style="background:white;color:blue" |  AMOT_HUMAN ||  ||  GO:0043532 ||  PMID:16043488 ||  IDA ||  ||  F ||style="background:white;color:blue" |  AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" |  IPI00163085 ||style="background:white;color:blue" |  protein ||style="color:blue" |  taxon:9606 ||  20051207 ||  UniProtKB ||  ||style="background:white;color:blue" |   
| style="color:blue" |  UniProtKB || style="color:blue" |  Q4VCS5 || style="background:white;color:blue" |  AMOT_HUMAN ||  ||  GO:0043532 ||  PMID:16043488 ||  IDA ||  ||  F ||style="background:white;color:blue" |  AMOT, KIAA1071: Angiomotin ||style="background:white;color:blue" |  IPI00163085 ||style="background:white;color:blue" |  protein ||style="color:blue" |  taxon:9606 ||  20051207 ||  UniProtKB ||  ||style="background:white;color:blue" |  Q4VCS5-2
|}
|}



Revision as of 12:00, 19 October 2009

Proposal to split the information in the GAF files into two sets, association data and gene product data.

Current Association File Format

column required? contents cardinality
1 required DB 1
2 required DB_Object_ID 1
3 required DB_Object_Symbol 1
4 optional Qualifier 0 or greater
5 required GO ID 1
6 required DB:Reference(s) 1 or greater
7 required Evidence code 1
8 optional With (or) From 0 or greater
9 required Aspect 1
10 optional DB_Object_Name 0 or 1
11 optional DB_Object_Synonym(s) 0 or greater
12 required DB_Object_Type (refers to col 17 if present) 1
13 required taxon 1 or 2 (for multi-org processes)
14 required Date 1
15 required Assigned_by 1
16 optional Annotation cross products ?
17 optional Spliceform 1


Proposal: remove gene product data from the association file, leaving just an identifier.

new format for storing annotations:

old column # required? contents cardinality
1 required DB 1
17 if present; else 2 required Spliceform ID OR DB_Object_ID 1
4 optional Qualifier 0 or greater
5 required GO ID 1
6 required DB:Reference(s) 1 or greater
7 required Evidence code 1
8 optional With (or) From 0 or greater
9 required Aspect 1
13 optional Interacting taxon ID (for multi-organism processes) 0 or 1
14 required Date 1
15 required Assigned_by 1
16 optional Annotation cross products ?


Gene product data would be stored in a separate file. It would consist of the following pieces of information:

old column # required? contents cardinality
1 required DB 1
2 required DB_Object_ID 1
3 required DB_Object_Symbol 1
10 optional DB_Object_Name 0 or 1
11 optional DB_Object_Synonym(s) 0 or greater
12 required DB_Object_Type 1
13 required taxon 1
n/a required UniProt xref (from gp2 protein file) 1


Any GPs with different spliceforms would also have the following data:

old column # required? contents cardinality
17 required Spliceform ID 1
12 required Spliceform object type 1


Ideally, the gene product files would also include the gp2protein data, so we would have an additional piece of data, an xref to a UniProt or NCBI ID.

Example

The following appears on the page http://geneontology.org/GO.annotation.fields.shtml and is an example of the current GAF file structure (shaded bg: annotation data; blue text: gp data):

1

DB

2

DB Object ID

3

DB Object Symbol

4

Qualifier

5

GO ID

6

DB:Reference(s)

7

Evidence code

8

With (or) From

9

Aspect

10

DB Object Name

11

DB Object Synonym(s)

12

DB Object Type (refers to col 17 if present)

13

taxon

14

Date

15

Assigned by

16

Annotation cross products

17

Spliceform

SGD S000000296 PHO3 GO:0003993 SGD_REF:S000047763 IMP F acid phosphatase YBR092C gene taxon:4932 20010118 SGD
SGD S000000296 PHO3 GO:0006796 SGD_REF:S000047115 TAS P acid phosphatase YBR092C gene taxon:4932 20041220 SGD
SGD S000005370 RCL1 NOT GO:0003963 SGD_REF:S000039255 IDA F YOL010W gene taxon:4932 20020530 SGD
SGD S000005197 TEX1 GO:0006406 SGD_REF:S000069956 IC GO:0000346 P YNL253W gene taxon:4932|taxon:745953 20030221 SGD
SGD S000005316 ABZ1 GO:0046820 SGD_REF:S000057703 ISS CGSC:pabA F aminodeoxychorismate synthase YNR033W gene taxon:4932|taxon:2861 20030106 SGD
UniProtKB Q4VCS5 AMOT_HUMAN GO:0031410 PMID:11257124 IDA C AMOT, KIAA1071: Angiomotin IPI00163085 protein taxon:9606 20051207 UniProtKB
UniProtKB Q4VCS5 AMOT_HUMAN GO:0043532 PMID:11257124 IDA F AMOT, KIAA1071: Angiomotin IPI00163085 protein taxon:9606 20051207 UniProtKB
UniProtKB Q4VCS5 AMOT_HUMAN GO:0043116 PMID:16043488 IDA P AMOT, KIAA1071:Angiomotin IPI00163085 snoRNA taxon:9606 20051207 UniProtKB Q4VCS5-1
UniProtKB Q4VCS5 AMOT_HUMAN GO:0005515 PMID:16043488 IPI UniProtKB:Q6RHR9-2 F AMOT, KIAA1071: Angiomotin IPI00163085 snoRNA taxon:9606 20051207 UniProtKB Q4VCS5-1
UniProtKB Q4VCS5 AMOT_HUMAN GO:0043532 PMID:16043488 IDA F AMOT, KIAA1071: Angiomotin IPI00163085 protein taxon:9606 20051207 UniProtKB Q4VCS5-2


This is how it could look in the proposed new format.

Association data:

1

DB

17 or 2

Spliceform ID OR DB Object ID

4

Qualifier

5

GO ID

6

DB:Reference(s)

7

Evidence code

8

With (or) From

9

Aspect

13

Interacting taxon ID (for multi-organism processes)

14

Date

15

Assigned_by

16

Annotation cross products

SGD S000000296 GO:0003993 SGD_REF:S000047763 IMP F 20010118 SGD
SGD S000000296 GO:0006796 SGD_REF:S000047115 TAS P 20041220 SGD
SGD S000005370 NOT GO:0003963 SGD_REF:S000039255 IDA F 20020530 SGD
SGD S000005197 GO:0006406 SGD_REF:S000069956 IC GO:0000346 P taxon:745953 20030221 SGD
SGD S000005316 GO:0046820 SGD_REF:S000057703 ISS CGSC:pabA F taxon:2861 20030106 SGD
UniProtKB Q4VCS5 GO:0031410 PMID:11257124 IDA C 20051207 UniProtKB
UniProtKB Q4VCS5 GO:0043532 PMID:11257124 IDA F 20051207 UniProtKB
UniProtKB Q4VCS5-1 GO:0043116 PMID:16043488 IDA P 20051207 UniProtKB
UniProtKB Q4VCS5-1 GO:0005515 PMID:16043488 IPI UniProtKB:Q6RHR9-2 F 20051207 UniProtKB
UniProtKB Q4VCS5-2 GO:0043532 PMID:16043488 IDA F 20051207 UniProtKB


GP data:

1

DB

2

DB_Object_ID

3

DB_Object_Symbol

10

DB_Object_Name

11

DB_Object_Synonym(s)

12

DB_Object_Type

13

taxon

n/a

UniProt xref (from gp2protein file)

n/a

Spliceform ID, spliceform type

SGD S000000296 PHO3 acid phosphatase YBR092C gene taxon:4932 UniProt:NE92D8
SGD S000005370 RCL1 YOL010W gene taxon:4932 UniProt:JN97D8
SGD S000005197 TEX1 YNL253W gene taxon:4932 UniProt:F9NO8X
SGD S000005316 ABZ1 aminodeoxychorismate synthase YNR033W gene taxon:4932 UniProt:C2BF93
UniProtKB Q4VCS5 AMOT_HUMAN AMOT, KIAA1071: Angiomotin IPI00163085 protein taxon:9606 UniProt:Q4VCS5 Q4VCS5-1, snoRNA | Q4VCS5-2, protein

The representation of the spliceforms could be changed if it isn't clear enough.