Annotation Conf. Call, February 14, 2012: Difference between revisions
No edit summary |
No edit summary |
||
Line 9: | Line 9: | ||
* can we have a quick review of what is the preferred mechanism right now for feedback on PAINT annotations? (Kimberly) | * can we have a quick review of what is the preferred mechanism right now for feedback on PAINT annotations? (Kimberly) | ||
* QC checks | * new QC checks (Amelia) - see below | ||
* col 17 entry ID hierarchy - see below | |||
==Suggested QC Checks== | |||
===Remove redundant GP info=== | |||
The GP synonyms column must not contain information from other columns (GP symbol, GP name, DB object ID) as this info is redundant | |||
e.g. incorrect: | |||
{| border="1" cellspacing="0" cellpadding="5" | |||
! 1<br>DB | |||
! 2<br>DB object ID | |||
! 3<br>DB object symbol | |||
! ... | |||
! 10<br>DB object name | |||
! 11<br>DB object synonym | |||
! 12<br>DB object type | |||
|- | |||
| PomBase | |||
| SPCC1884.02 | |||
| nic1 | |||
| ... | |||
| NiCoT heavy metal ion transporter Nic1 | |||
| SPCC1884.02 | nic1 | SPCC757.01 | |||
| gene | |||
|} | |||
correct: | |||
{| border="1" cellspacing="0" cellpadding="5" | |||
! 1<br>DB | |||
! 2<br>DB object ID | |||
! 3<br>DB object symbol | |||
! ... | |||
! 10<br>DB object name | |||
! 11<br>DB object synonym | |||
! 12<br>DB object type | |||
|- | |||
| PomBase | |||
| SPCC1884.02 | |||
| nic1 | |||
| ... | |||
| NiCoT heavy metal ion transporter Nic1 | |||
| SPCC757.01 | |||
| gene | |||
|} | |||
===Col 17 ID format=== | |||
Only one ID is allowed in col 17, and that ID should be formatted correctly and be from a database listed in GO.xrf_abbs. | |||
===Col 17 entities should always be related to the same col 2 entry=== | |||
See the [docs on col 17 http://www.geneontology.org/GO.format.gaf-2_0.shtml#gene_product_form_id] for a refresher on col 17 contents | |||
Where spliceforms exist, they must always have the same parent GP ID - unless you can think of any case in which this would not happen? | |||
e.g. incorrect | |||
{| border="1" cellspacing="0" cellpadding="5" | |||
! 1<br>DB | |||
! 2<br>DB object ID | |||
! ... | |||
! 17<br>gene product form ID | |||
|- | |||
| MGI | |||
| MGI:123456 | |||
| ... | |||
| UniProt:P0217K-3 | |||
|- | |||
| MGI | |||
| MGI:654321 | |||
| ... | |||
| UniProt:P0217K-3 | |||
|} | |||
Correct: | |||
{| border="1" cellspacing="0" cellpadding="5" | |||
! 1<br>DB | |||
! 2<br>DB object ID | |||
! ... | |||
! 17<br>gene product form ID | |||
|- | |||
| MGI | |||
| MGI:123456 | |||
| ... | |||
| UniProt:P0217K-3 | |||
|- | |||
| MGI | |||
| MGI:123456 | |||
| ... | |||
| UniProt:P0217K-3 | |||
|} | |||
==Col 17 ID Hierarchy== | |||
Identifiers in column 17 come from a range of databases; propose creating a list of preferred databases from which the IDs are taken. | |||
DBs used so far: | |||
{| border="1" cellspacing="0" cellpadding="5" align="left" | |||
! Database | |||
! GP form types | |||
! # distinct IDs | |||
! Assigned by | |||
|- | |||
| ENSEMBL | |||
| protein | |||
| 2464 | |||
| | |||
BHF-UCL | |||
DFLAT | |||
GOC | |||
HGNC | |||
IntAct | |||
MGI | |||
RGD | |||
RefGenome | |||
UniProtKB | |||
|- | |||
| PR | |||
| protein | |||
| 3 | |||
|MGI | |||
|- | |||
| protein_id | |||
| protein | |||
| 31 | |||
|MGI | |||
|- | |||
| Protein_id [capitalization error] | |||
| protein | |||
| 1 | |||
|MGI | |||
|- | |||
| RefSeq | |||
| gene, protein | |||
| 3215 | |||
| | |||
BHF-UCL | |||
GOC | |||
IntAct | |||
MGI | |||
RGD | |||
RefGenome | |||
UniProtKB | |||
|- | |||
| TAIR | |||
| RNA, gene_product, miRNA, protein, rRNA, snRNA, snoRNA, tRNA | |||
| 45992 | |||
| | |||
GOC | |||
IntAct | |||
RefGenome | |||
TAIR | |||
TIGR | |||
UniProtKB | |||
|- | |||
| UniProtKB | |||
| protein | |||
| 4601 | |||
| | |||
BHF-UCL | |||
DFLAT | |||
GOC | |||
HGNC | |||
IntAct | |||
MGI | |||
PINC | |||
RGD | |||
RefGenome | |||
Roslin_Institute | |||
UniProtKB | |||
|- | |||
| UniPRotKB [capitalization error] | |||
| protein | |||
| 1 | |||
|MGI | |||
|- | |||
| uniProtKB [capitalization error] | |||
| protein | |||
| 2 | |||
|MGI | |||
|- | |||
| VEGA | |||
| protein | |||
| 13706 | |||
| | |||
BHF-UCL | |||
DFLAT | |||
GOC | |||
HGNC | |||
IntAct | |||
MGI | |||
PINC | |||
RGD | |||
RefGenome | |||
Roslin_Institute | |||
UniProtKB | |||
|- | |||
| WB | |||
| gene | |||
| 4 | |||
|WB | |||
|- | |||
| WP | |||
| gene | |||
| 6 | |||
|WB | |||
|} |
Revision as of 00:57, 14 February 2012
Agenda for Annotation Call
- More evidence codes - new Evidence code for Inferences based on Ontology links (http://gocwiki.geneontology.org/index.php/Evidence_for_Inferences_based_on_Ontology_links) (Rama)
- Update on protein binding obsoletions(Jane)
- Update on communication mechanisms for changes to the GO taxon file. (Jane)
- can we have a quick review of what is the preferred mechanism right now for feedback on PAINT annotations? (Kimberly)
- new QC checks (Amelia) - see below
- col 17 entry ID hierarchy - see below
Suggested QC Checks
Remove redundant GP info
The GP synonyms column must not contain information from other columns (GP symbol, GP name, DB object ID) as this info is redundant
e.g. incorrect:
1 DB |
2 DB object ID |
3 DB object symbol |
... | 10 DB object name |
11 DB object synonym |
12 DB object type |
---|---|---|---|---|---|---|
PomBase | SPCC1884.02 | nic1 | ... | NiCoT heavy metal ion transporter Nic1 | SPCC1884.02 | nic1 | SPCC757.01 | gene |
correct:
1 DB |
2 DB object ID |
3 DB object symbol |
... | 10 DB object name |
11 DB object synonym |
12 DB object type |
---|---|---|---|---|---|---|
PomBase | SPCC1884.02 | nic1 | ... | NiCoT heavy metal ion transporter Nic1 | SPCC757.01 | gene |
Col 17 ID format
Only one ID is allowed in col 17, and that ID should be formatted correctly and be from a database listed in GO.xrf_abbs.
See the [docs on col 17 http://www.geneontology.org/GO.format.gaf-2_0.shtml#gene_product_form_id] for a refresher on col 17 contents
Where spliceforms exist, they must always have the same parent GP ID - unless you can think of any case in which this would not happen?
e.g. incorrect
1 DB |
2 DB object ID |
... | 17 gene product form ID |
---|---|---|---|
MGI | MGI:123456 | ... | UniProt:P0217K-3 |
MGI | MGI:654321 | ... | UniProt:P0217K-3 |
Correct:
1 DB |
2 DB object ID |
... | 17 gene product form ID |
---|---|---|---|
MGI | MGI:123456 | ... | UniProt:P0217K-3 |
MGI | MGI:123456 | ... | UniProt:P0217K-3 |
Col 17 ID Hierarchy
Identifiers in column 17 come from a range of databases; propose creating a list of preferred databases from which the IDs are taken.
DBs used so far:
Database | GP form types | # distinct IDs | Assigned by |
---|---|---|---|
ENSEMBL | protein | 2464 |
BHF-UCL DFLAT GOC HGNC IntAct MGI RGD RefGenome UniProtKB |
PR | protein | 3 | MGI |
protein_id | protein | 31 | MGI |
Protein_id [capitalization error] | protein | 1 | MGI |
RefSeq | gene, protein | 3215 |
BHF-UCL GOC IntAct MGI RGD RefGenome UniProtKB |
TAIR | RNA, gene_product, miRNA, protein, rRNA, snRNA, snoRNA, tRNA | 45992 |
GOC IntAct RefGenome TAIR TIGR UniProtKB |
UniProtKB | protein | 4601 |
BHF-UCL DFLAT GOC HGNC IntAct MGI PINC RGD RefGenome Roslin_Institute UniProtKB |
UniPRotKB [capitalization error] | protein | 1 | MGI |
uniProtKB [capitalization error] | protein | 2 | MGI |
VEGA | protein | 13706 |
BHF-UCL DFLAT GOC HGNC IntAct MGI PINC RGD RefGenome Roslin_Institute UniProtKB |
WB | gene | 4 | WB |
WP | gene | 6 | WB |