Column 16 discussion 12-12-09

Revision as of 10:37, 10 December 2009

Use of Cell Type as an Annotation_Cross_Product in column 16.

CL identifiers would be included in column 16 for a GO annotation whenever that information is present in a particular paper. No judgment is made as to whether a gene product is involved in a particular process in just a particular cell type or in all cell types. In other words, curators simply annotate all available data in a paper.

Therefore it is incorrect to assume that a gene product used in a GO annotation that has a CL identifier in column 16 is involved in the curated process only in that annotated cell type. Similarly, it would be a mistake to conclude that lack of a CL co-annotation indicates that a given gene product is involved in a process in all cell types where it is found. The only correct interpretation of a GO annotation with a CL co-annotation is that in one particular experiment a given gene product was found to be involved in a particular process in a particular cell line.

Annotation Format of column 16 for Cell Type:

• If CL is used to refine a CC annotation, then the relation (for now) must be part_of

• If CL is used to refine a BP or MF, then the relation (for now) must be occurs_in

Simple annotation for Column 16

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
TLR4 cell surface (GO:0005887) PMID:nnn part_of(CL:0000576)
CREB gluconeogenesis (GO:0006094) PMID:nnnn occurs_in(CL:0000182)

Where a protein has multiple cellular locations:

Gene (col 2/3) Term (col 5) Ref (col 6) Ext (col 16)
Gene1234 mitochondrial membrane (GO:0031966) PMID:nnnn part_of(CL:0000236)

N.B. no meaning is attached to the order of that the CL identifiers are listed in column 16.


from the GOA team:

1. In the proposed format for column 16, I can't see how I could indicate the biological processes which are dependent on a certain protein-protein interaction? Perhaps we need a new relationship such as 'required_for', to be used in conjunction with BP terms, which could be added into column 16 for the annotation to the 'protein binding' term? Example: The biological processes GO:0008284, GO:0050870, GO:0022409, occur when Q9P1W8 interacts with Q08722 (PMID:15383453).

2. We were confused on how to use 'has_input' and 'has_output'. There is not much information on these two relationships on this wiki page. In particular could we have examples describing the use of the 'has_output' relationship?

3. What relationship should be used if we would like to indicate the target of a certain molecular function or biological process? For instance:

The target of protein Q9BRA2's protein-disulfide reductase activity is P63167

Q9BRA2 TXNDC17 GO:0047134 protein-disulfide reductase activity PMID:18579519 IDA F Thioredoxin domain-containing protein 17 TXNDC17|TXNL5|IPI00646689|TXD17_HUMAN protein taxon:9606 20080627 UniProtKB

4. Should there be annotation guidelines requiring that when a column 16 is used to link a GO MF to a CC term, it needs to be complemented by an annotation using the same PMID for the corresponding CC term?

5. Is it correct to assume that when a protein is the target of some transcription activity, we should indicate the target in column 16 in both the annotation to the process terms (e.g. positive regulation of transcription) and molecular function terms (e.g. transcription factor activity)?