Gene Product Association Data (GPAD) Format (Archived)

Proposal to split the information in the GAF files into two sets, association data and gene product data.

Current Association File Format

column	required?	contents	cardinality
1	required	DB	1
2	required	DB_Object_ID	1
3	required	DB_Object_Symbol	1
4	optional	Qualifier	0 or greater
5	required	GO ID	1
6	required	DB:Reference(s)	1 or greater
7	required	Evidence code	1
8	optional	With (or) From	0 or greater
9	required	Aspect	1
10	optional	DB_Object_Name	0 or 1
11	optional	DB_Object_Synonym(s)	0 or greater
12	required	DB_Object_Type (refers to col 17 if present)	1
13	required	taxon	1 or 2 (for multi-org processes)
14	required	Date	1
15	required	Assigned_by	1
16	optional	Annotation cross products	?
17	optional	Spliceform	1

Proposal: remove gene product data from the association file, leaving just an identifier new format:

old column #	required?	contents	cardinality
1	required	DB	1
17 if present; else 2	required	Spliceform ID OR DB_Object_ID	1
4	optional	Qualifier	0 or greater
5	required	GO ID	1
6	required	DB:Reference(s)	1 or greater
7	required	Evidence code	1
8	optional	With (or) From	0 or greater
14	required	Date	1
15	required	Assigned_by	1
16	optional	Annotation cross products	?
13	optional	Interacting taxon ID (for multi-organism processes)	0 or 1

Gene product data would be stored in a separate file. It would consist of the following pieces of information:

old column #	required?	contents	cardinality
1	required	DB	1
2	required	DB_Object_ID	1
3	required	DB_Object_Symbol	1
10	optional	DB_Object_Name	0 or 1
11	optional	DB_Object_Synonym(s)	0 or greater
12	required	DB_Object_Type	1
13	required	taxon	1

Any GPs with different spliceforms would also have the following data:

old column #	required?	contents	cardinality
17	required	Spliceform ID	1
12	required	Spliceform object type	1

Ideally, the gene product files would also include the gp2protein data, so we would have an additional piece of data, an xref to a UniProt or NCBI ID.

Example

The following appears on the page http://geneontology.org/GO.annotation.fields.shtml and is an example of the current GAF file structure:

SGD	S000000296	PHO3		GO:0003993	SGD_REF:S000047763|PMID:2407294 	IMP		F	acid phosphatase	YBR092C	gene	taxon:4932	20010118	SGD
SGD	S000000296	PHO3		GO:0006796	SGD_REF:S000047115|PMID:2407294 	TAS		P	acid phosphatase	YBR092C	gene	taxon:4932	20041220	SGD
SGD	S000005370	RCL1	NOT	GO:0003963	SGD_REF:S000039255|PMID:10790377	IDA		F		YOL010W	gene	taxon:4932	20020530	SGD
SGD	S000005197	TEX1		GO:0006406	SGD_REF:S000069956|PMID:11979277	IC	GO:0000346	P		YNL253W	gene	taxon:4932	20030221	SGD
SGD	S000005316	ABZ1		GO:0046820	SGD_REF:S000057703|PMID:8346682 	ISS	CGSC:pabA|CGSC:pabB	F	aminodeoxychorismate synthase	YNR033W	gene	taxon:4932	20030106	SGD

This is how it could look in the proposed new format.

Association data:

SGD	S000000296		GO:0003993	F	SGD_REF:S000047763|PMID:2407294 	IMP		20010118	SGD
SGD	S000000296		GO:0006796	P	SGD_REF:S000047115|PMID:2407294 	TAS		20041220	SGD
SGD	S000005370	NOT	GO:0003963	F	SGD_REF:S000039255|PMID:10790377	IDA		20020530	SGD
SGD	S000005197		GO:0006406	P	SGD_REF:S000069956|PMID:11979277	IC	GO:0000346	20030221	SGD
SGD	S000005316		GO:0046820	F	SGD_REF:S000057703|PMID:8346682 	ISS	CGSC:pabA|CGSC:pabB	20030106	SGD

GP data:

SGD	S000000296	PHO3	acid phosphatase	YBR092C	gene	taxon:4932	UniProt:XXXXXX
SGD	S000005370	RCL1		YOL010W	gene	taxon:4932	UniProt:XXXXXX
SGD	S000005197	TEX1		YNL253W	gene	taxon:4932	UniProt:XXXXXX
SGD	S000005316	ABZ1	aminodeoxychorismate synthase	YNR033W	gene	taxon:4932	UniProt:XXXXXX

Gene Product Association Data (GPAD) Format (Archived)

Current Association File Format

Example

Navigation menu