Oct 2012 Meeting to finalize GPAD specification (Archived): Difference between revisions
mNo edit summary |
mNo edit summary |
||
Line 120: | Line 120: | ||
[[Category:Archived]] | [[Category:Archived]] | ||
Latest revision as of 11:38, 12 April 2019
Link to previous proposed GPI format; http://wiki.geneontology.org/index.php/Gene_Product_Data_File_Format
and previous proposed GPAD format; http://wiki.geneontology.org/index.php/Gene_Product_Association_Data_%28GPAD%29_Format
Link to final GPAD/GPI format; Final GPAD and GPI file format
gp_association files
Questions awaiting discussion with Chris;
- Translating existing qualifiers CONTRIBUTES_TO and COLOCALIZES_WITH into annotation relations (+ NOT?)
N.B. The first line in the gp_association file should be; !gpa-version: 1.1
Proposed format (21 Nov 2012)
column | name | required? | cardinality | old column # | extra info | |
---|---|---|---|---|---|---|
1 | DB | required | 1 | 1 | must be in xrf_abbs | |
2 | DB_Object_ID | required | 1 | 2 | canonical or spliceform ID | |
3 | Qualifier | required | 0 or greater | 4 | (NOT or integral_to)? (other_organism or colocalizes_with or contributes_to)? annotation_relation | |
4 | GO ID | required | 1 | 5 | must be extant GO ID | |
5 | DB:Reference(s) | required | 1 or greater | 6 | DB must be in xrf_abbs | |
6 | Evidence code | required | 1 | 7 | from ECO | |
7 | With (or) From | optional | 0 or greater | 8 | ||
8 | Interacting taxon ID (for multi-organism processes) | optional | 0 or 1 | 13 | ncbi taxon ID | |
9 | Date | required | 1 | 14 | YYYYMMDD | |
10 | Assigned_by | required | 1 | 15 | from xrf_abbs | |
11 | Annotation Extension | optional | 0 or greater | 16 | ||
12 | Annotation Properties | optional | 0 or greater | See Note 1 below |
Note 1. The Annotation Properties column can be filled with a pipe separated list of "property_name = property_value". There will be a fixed vocabulary for the property names and this list can be extended when necessary. The initial supported properties would be curator_name and annotation_identifier*, but can be extended to include e.g. curator_ID, modification_date, creation_date, annotation_notes...etc.
* curator_name and annotation_identifier will be useful for groups that are using Protein2GO for protein annotation who wish to maintain their annotations in their own database. These values can be used to keep track of individual annotations.
gp_information files
1. Have decided columns for DB_subset (e.g. Swiss-Prot, TrEMBL) and Annotation Target Set (e.g. Reference Genome) are not necessary.
2. Includes a column for Xrefs to other databases (Col. 9). This would be useful for mapping of MOD-specific identifiers/symbols/synonyms to UniProt accessions to assist MOD curators moving to Protein2GO in searching for familiar IDs/gene names.
3. It is proposed to dispose of the 'DB' column (e.g. UniProt, WB) and replace it with a header line specifying the namespace of the annotating groups' identifiers, e.g. WB:, UniProtKB:?
None of the gp2protein files in the GOC SVN repository refer to objects from more than one namespace in column 1, so it seems like unnecessary repetition and redundancy to repeat the namespace in every row of the gpi file. Parent_object_id would only need to be qualified if it referred to an object from a different namespace (this could happen if we start annotating protein complexes)
N.B. The first two lines in the gp_information file should be; !gpi-version: 1.1 !namespace: <database>
Proposed format (21 Nov 2012)
column | name | required? | cardinality | GAF column | Example for UniProt | Example for WormBase |
---|---|---|---|---|---|---|
01 | DB_Object_ID | required | 1 | 2/17 | Q4VCS5-1 | WBGene00000035 |
02 | DB_Object_Symbol | required | 1 | 3 | AMOT | ace-1 |
03 | DB_Object_Name | optional | 0 or 1 | 10 | Angiomotin | |
04 | DB_Object_Synonym(s) | optional | 0 or greater | 11 | KIAA1071|AMOT | ACE1 |
05 | DB_Object_Type | required | 1 | 12 | protein | gene |
06 | Taxon | required | 1 | 13 | taxon:9606 | taxon:6239 |
07 | Annotation_Completed | optional | 1 | - | 20120614 (YYYYMMDD) | 20100405 |
08 | Parent_Object_ID | optional | 0 or 1 | - | Q4VCS5 | WBGene00000035 |
09 | DB_Xref(s) | optional | 0 or greater | - | - | UniProtKB:P38433 |
N.B. Where it is stated that a column can have one or greater values, e.g. 'with', DB_Object_Synonym(s), DB_Xref(s), the values should be given as a pipe-separated list.