Oct 2012 Meeting to finalize GPAD specification (Archived): Difference between revisions

From GO Wiki
Jump to navigation Jump to search
Line 6: Line 6:


===gp_association files===
===gp_association files===
Questions awaiting discussion with Chris;
#Translating existing qualifiers CONTRIBUTES_TO and COLOCALIZES_WITH into annotation relations (+ NOT?)
#Translating existing qualifiers CONTRIBUTES_TO and COLOCALIZES_WITH into annotation relations (+ NOT?)
#Filling in relationships retrospectively - implicit or explicit values in qualifier column?  
#Filling in relationships retrospectively - implicit or explicit values in qualifier column?  

Revision as of 07:08, 22 November 2012

Link to previous proposed GPI format; http://wiki.geneontology.org/index.php/Gene_Product_Data_File_Format

and previous proposed GPAD format; http://wiki.geneontology.org/index.php/Gene_Product_Association_Data_%28GPAD%29_Format

Issues for discussion:

gp_association files

Questions awaiting discussion with Chris;

  1. Translating existing qualifiers CONTRIBUTES_TO and COLOCALIZES_WITH into annotation relations (+ NOT?)
  2. Filling in relationships retrospectively - implicit or explicit values in qualifier column?

Proposed format (21 Nov 2012)

contents required? cardinality old column # extra info
DB required 1 1 must be in xrf_abbs
DB_Object_ID required 1 2 canonical or spliceform ID
Qualifier optional 0 or greater 4 (NOT or integral_to)? (other_organism or colocalizes_with or contributes_to)? annotation_relation
GO ID required 1 5 must be extant GO ID
DB:Reference(s) required 1 or greater 6 DB must be in xrf_abbs
Evidence code required 1 7 from ECO
With (or) From optional 0 or greater 8
Interacting taxon ID (for multi-organism processes) optional 0 or 1 13 ncbi taxon ID
Date required 1 14 YYYYMMDD
Assigned_by required 1 15 from xrf_abbs
Annotation XP (Annotation Cross Products) optional 0 or greater 16

gp_information files

1. gp_information.goa_uniprot (gpi-version: 1.0) currently has these columns, some of which (db_subset, annotation_target_set, annotation_completed) are not present in the proposed 1.1 format.

2. It does not contain xrefs to other databases, however this would be useful for mapping of MOD-specific identifiers/symbols/synonyms to UniProt accessions to assist MOD curators moving to Protein2GO in searching for familiar IDs/gene names.

3. Is it possible to dispose of Col.1 and replace it with a header line specifying the namespace of the annotating groups' identifiers, e.g. WB:, UniProtKB:?

None of the gp2protein files in the GOC SVN repository refer to objects from more than one namespace in column 1, so it seems like unnecessary repetition and redundancy to repeat the namespace in every row of the gpi file. Parent_object_id would only need to be qualified if it referred to an object from a different namespace (is this ever likely to happen in practice?)

Proposed format (21 Nov 2012)

column name required? cardinality GAF column Example for UniProt Example for WormBase
01 DB_Object_ID required 1 2 Q4VCS5-1
02 DB_Object_Symbol required 1 3 AMOT
03 DB_Object_Name optional 0 or 1 10 Angiomotin
04 DB_Object_Synonym(s) optional 0 or greater 11 KIAA1071|IPI:IPI00163085|IPI:IPI00644547|UniProtKB:AMOT_HUMAN
05 DB_Object_Type required 1 12 protein
06 Taxon required 1 13 taxon:9606
07 Annotation_Completed optional 1 - timestamp (YYYYMMDD)
08 Parent_Object_ID optional 0 or 1 - UniProtKB:Q4VCS5
09 DB_Xref(s) optional 0 or greater - - UniProtKB:Q4VCS5