Annotation Conf. Call, February 14, 2012: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:
* can we have a quick review of what is the preferred mechanism right now for feedback on PAINT annotations? (Kimberly)
* can we have a quick review of what is the preferred mechanism right now for feedback on PAINT annotations? (Kimberly)


* QC checks for Col-17 (Amelia)
* new QC checks (Amelia) - see below
 
* col 17 entry ID hierarchy - see below
 
==Suggested QC Checks==
 
===Remove redundant GP info===
 
The GP synonyms column must not contain information from other columns (GP symbol, GP name, DB object ID) as this info is redundant
 
e.g. incorrect:
 
{| border="1" cellspacing="0" cellpadding="5"
! 1<br>DB
! 2<br>DB object ID
! 3<br>DB object symbol
! ...
! 10<br>DB object name
! 11<br>DB object synonym
! 12<br>DB object type
|-
| PomBase
| SPCC1884.02
| nic1
| ...
| NiCoT heavy metal ion transporter Nic1
| SPCC1884.02 &#124; nic1 &#124; SPCC757.01
| gene
|}
 
 
correct:
 
{| border="1" cellspacing="0" cellpadding="5"
! 1<br>DB
! 2<br>DB object ID
! 3<br>DB object symbol
! ...
! 10<br>DB object name
! 11<br>DB object synonym
! 12<br>DB object type
|-
| PomBase
| SPCC1884.02
| nic1
| ...
| NiCoT heavy metal ion transporter Nic1
| SPCC757.01
| gene
|}
 
 
===Col 17 ID format===
 
Only one ID is allowed in col 17, and that ID should be formatted correctly and be from a database listed in GO.xrf_abbs.
 
 
===Col 17 entities should always be related to the same col 2 entry===
 
See the [docs on col 17 http://www.geneontology.org/GO.format.gaf-2_0.shtml#gene_product_form_id] for a refresher on col 17 contents
 
Where spliceforms exist, they must always have the same parent GP ID - unless you can think of any case in which this would not happen?
 
e.g. incorrect
 
{| border="1" cellspacing="0" cellpadding="5"
! 1<br>DB
! 2<br>DB object ID
! ...
! 17<br>gene product form ID
|-
| MGI
| MGI:123456
| ...
| UniProt:P0217K-3
|-
| MGI
| MGI:654321
| ...
| UniProt:P0217K-3
|}
 
Correct:
 
{| border="1" cellspacing="0" cellpadding="5"
! 1<br>DB
! 2<br>DB object ID
! ...
! 17<br>gene product form ID
|-
| MGI
| MGI:123456
| ...
| UniProt:P0217K-3
|-
| MGI
| MGI:123456
| ...
| UniProt:P0217K-3
|}
 
 
==Col 17 ID Hierarchy==
 
Identifiers in column 17 come from a range of databases; propose creating a list of preferred databases from which the IDs are taken.
 
DBs used so far:
 
{| border="1" cellspacing="0" cellpadding="5" align="left"
! Database
! GP form types
! # distinct IDs
! Assigned by
|-
| ENSEMBL
| protein
| 2464
|
BHF-UCL
DFLAT
GOC
HGNC
IntAct
MGI
RGD
RefGenome
UniProtKB
|-
| PR
| protein
| 3
|MGI
|-
| protein_id
| protein
| 31
|MGI
|-
| Protein_id [capitalization error]
| protein
| 1
|MGI
|-
| RefSeq
| gene, protein
| 3215
|
BHF-UCL
GOC
IntAct
MGI
RGD
RefGenome
UniProtKB
|-
| TAIR
| RNA, gene_product, miRNA, protein, rRNA, snRNA, snoRNA, tRNA
| 45992
|
GOC
IntAct
RefGenome
TAIR
TIGR
UniProtKB
|-
| UniProtKB
| protein
| 4601
|
BHF-UCL
DFLAT
GOC
HGNC
IntAct
MGI
PINC
RGD
RefGenome
Roslin_Institute
UniProtKB
|-
| UniPRotKB [capitalization error]
| protein
| 1
|MGI
|-
| uniProtKB [capitalization error]
| protein
| 2
|MGI
|-
| VEGA
| protein
| 13706
|
BHF-UCL
DFLAT
GOC
HGNC
IntAct
MGI
PINC
RGD
RefGenome
Roslin_Institute
UniProtKB
|-
| WB
| gene
| 4
|WB
|-
| WP
| gene
| 6
|WB
|}

Revision as of 00:57, 14 February 2012

Agenda for Annotation Call

  • Update on communication mechanisms for changes to the GO taxon file. (Jane)
  • can we have a quick review of what is the preferred mechanism right now for feedback on PAINT annotations? (Kimberly)
  • new QC checks (Amelia) - see below
  • col 17 entry ID hierarchy - see below

Suggested QC Checks

Remove redundant GP info

The GP synonyms column must not contain information from other columns (GP symbol, GP name, DB object ID) as this info is redundant

e.g. incorrect:

1
DB
2
DB object ID
3
DB object symbol
... 10
DB object name
11
DB object synonym
12
DB object type
PomBase SPCC1884.02 nic1 ... NiCoT heavy metal ion transporter Nic1 SPCC1884.02 | nic1 | SPCC757.01 gene


correct:

1
DB
2
DB object ID
3
DB object symbol
... 10
DB object name
11
DB object synonym
12
DB object type
PomBase SPCC1884.02 nic1 ... NiCoT heavy metal ion transporter Nic1 SPCC757.01 gene


Col 17 ID format

Only one ID is allowed in col 17, and that ID should be formatted correctly and be from a database listed in GO.xrf_abbs.


Col 17 entities should always be related to the same col 2 entry

See the [docs on col 17 http://www.geneontology.org/GO.format.gaf-2_0.shtml#gene_product_form_id] for a refresher on col 17 contents

Where spliceforms exist, they must always have the same parent GP ID - unless you can think of any case in which this would not happen?

e.g. incorrect

1
DB
2
DB object ID
... 17
gene product form ID
MGI MGI:123456 ... UniProt:P0217K-3
MGI MGI:654321 ... UniProt:P0217K-3

Correct:

1
DB
2
DB object ID
... 17
gene product form ID
MGI MGI:123456 ... UniProt:P0217K-3
MGI MGI:123456 ... UniProt:P0217K-3


Col 17 ID Hierarchy

Identifiers in column 17 come from a range of databases; propose creating a list of preferred databases from which the IDs are taken.

DBs used so far:

Database GP form types # distinct IDs Assigned by
ENSEMBL protein 2464

BHF-UCL DFLAT GOC HGNC IntAct MGI RGD RefGenome UniProtKB

PR protein 3 MGI
protein_id protein 31 MGI
Protein_id [capitalization error] protein 1 MGI
RefSeq gene, protein 3215

BHF-UCL GOC IntAct MGI RGD RefGenome UniProtKB

TAIR RNA, gene_product, miRNA, protein, rRNA, snRNA, snoRNA, tRNA 45992

GOC IntAct RefGenome TAIR TIGR UniProtKB

UniProtKB protein 4601

BHF-UCL DFLAT GOC HGNC IntAct MGI PINC RGD RefGenome Roslin_Institute UniProtKB

UniPRotKB [capitalization error] protein 1 MGI
uniProtKB [capitalization error] protein 2 MGI
VEGA protein 13706

BHF-UCL DFLAT GOC HGNC IntAct MGI PINC RGD RefGenome Roslin_Institute UniProtKB

WB gene 4 WB
WP gene 6 WB