Gene Product Association Data (GPAD) Format (Archived): Difference between revisions

From GO Wiki
Jump to navigation Jump to search
mNo edit summary
No edit summary
Line 252: Line 252:
  SGD S000005197 TEX1 YNL253W gene taxon:4932 UniProt:XXXXXX
  SGD S000005197 TEX1 YNL253W gene taxon:4932 UniProt:XXXXXX
  SGD S000005316 ABZ1 aminodeoxychorismate synthase YNR033W gene taxon:4932 UniProt:XXXXXX
  SGD S000005316 ABZ1 aminodeoxychorismate synthase YNR033W gene taxon:4932 UniProt:XXXXXX
[[Category:GAF]] [[Category:Annotation]]

Revision as of 12:44, 16 October 2009

Proposal to split the information in the GAF files into two sets, association data and gene product data.

Current Association File Format

column required? contents cardinality
1 required DB 1
2 required DB_Object_ID 1
3 required DB_Object_Symbol 1
4 optional Qualifier 0 or greater
5 required GO ID 1
6 required DB:Reference(s) 1 or greater
7 required Evidence code 1
8 optional With (or) From 0 or greater
9 required Aspect 1
10 optional DB_Object_Name 0 or 1
11 optional DB_Object_Synonym(s) 0 or greater
12 required DB_Object_Type (refers to col 17 if present) 1
13 required taxon 1 or 2 (for multi-org processes)
14 required Date 1
15 required Assigned_by 1
16 optional Annotation cross products ?
17 optional Spliceform 1

Proposal: remove gene product data from the association file, leaving just an identifier new format:

old column # required? contents cardinality
1 required DB 1
17 if present; else 2 required Spliceform ID OR DB_Object_ID 1
4 optional Qualifier 0 or greater
5 required GO ID 1
6 required DB:Reference(s) 1 or greater
7 required Evidence code 1
8 optional With (or) From 0 or greater
14 required Date 1
15 required Assigned_by 1
16 optional Annotation cross products ?
13 optional Interacting taxon ID (for multi-organism processes) 0 or 1

Gene product data would be stored in a separate file. It would consist of the following pieces of information:

old column # required? contents cardinality
1 required DB 1
2 required DB_Object_ID 1
3 required DB_Object_Symbol 1
10 optional DB_Object_Name 0 or 1
11 optional DB_Object_Synonym(s) 0 or greater
12 required DB_Object_Type 1
13 required taxon 1

Any GPs with different spliceforms would also have the following data:

old column # required? contents cardinality
17 required Spliceform ID 1
12 required Spliceform object type 1

Ideally, the gene product files would also include the gp2protein data, so we would have an additional piece of data, an xref to a UniProt or NCBI ID.


Example

The following appears on the page http://geneontology.org/GO.annotation.fields.shtml and is an example of the current GAF file structure:

SGD	S000000296	PHO3		GO:0003993	SGD_REF:S000047763|PMID:2407294 	IMP		F	acid phosphatase	YBR092C	gene	taxon:4932	20010118	SGD
SGD	S000000296	PHO3		GO:0006796	SGD_REF:S000047115|PMID:2407294 	TAS		P	acid phosphatase	YBR092C	gene	taxon:4932	20041220	SGD
SGD	S000005370	RCL1	NOT	GO:0003963	SGD_REF:S000039255|PMID:10790377	IDA		F		YOL010W	gene	taxon:4932	20020530	SGD
SGD	S000005197	TEX1		GO:0006406	SGD_REF:S000069956|PMID:11979277	IC	GO:0000346	P		YNL253W	gene	taxon:4932	20030221	SGD
SGD	S000005316	ABZ1		GO:0046820	SGD_REF:S000057703|PMID:8346682 	ISS	CGSC:pabA|CGSC:pabB	F	aminodeoxychorismate synthase	YNR033W	gene	taxon:4932	20030106	SGD

This is how it could look in the proposed new format.

Association data:

SGD	S000000296		GO:0003993	F	SGD_REF:S000047763|PMID:2407294 	IMP		20010118	SGD
SGD	S000000296		GO:0006796	P	SGD_REF:S000047115|PMID:2407294 	TAS		20041220	SGD
SGD	S000005370	NOT	GO:0003963	F	SGD_REF:S000039255|PMID:10790377	IDA		20020530	SGD
SGD	S000005197		GO:0006406	P	SGD_REF:S000069956|PMID:11979277	IC	GO:0000346	20030221	SGD
SGD	S000005316		GO:0046820	F	SGD_REF:S000057703|PMID:8346682 	ISS	CGSC:pabA|CGSC:pabB	20030106	SGD

GP data:

SGD	S000000296	PHO3	acid phosphatase	YBR092C	gene	taxon:4932	UniProt:XXXXXX
SGD	S000005370	RCL1		YOL010W	gene	taxon:4932	UniProt:XXXXXX
SGD	S000005197	TEX1		YNL253W	gene	taxon:4932	UniProt:XXXXXX
SGD	S000005316	ABZ1	aminodeoxychorismate synthase	YNR033W	gene	taxon:4932	UniProt:XXXXXX