Column 16 discussion 12-12-09
For reference page on column 16 Annotation_Cross-Products, see: http://wiki.geneontology.org/index.php/Annotation_Cross_Products
Use of Cell Type as an Annotation_Cross_Product in column 16.
CL identifiers would be included in column 16 for a GO annotation whenever that information is present in a particular paper. No judgment is made as to whether a gene product is involved in a particular process in just a particular cell type or in all cell types. In other words, curators simply annotate all available data in a paper.
Therefore it is incorrect to assume that a gene product used in a GO annotation that has a CL identifier in column 16 is involved in the curated process only in that annotated cell type. Similarly, it would be a mistake to conclude that lack of a CL co-annotation indicates that a given gene product is involved in a process in all cell types where it is found. The only correct interpretation of a GO annotation with a CL co-annotation is that in one particular experiment a given gene product was found to be involved in a particular process in a particular cell line.
Annotation Format of column 16 for Cell Type:
• If CL is used to refine a CC annotation, then the relation (for now) must be part_of
• If CL is used to refine a BP or MF, then the relation (for now) must be occurs_in
Simple annotation for Column 16
|Gene (col 2/3)||Term (col 5)||Ref (col 6)||Ext (col 16)|
|TLR4||cell surface (GO:0005887)||PMID:nnn||part_of(CL:0000576)|
Where a protein has multiple cellular locations, CL identifiers should be separated by a pipe (|):
e.g. part_of(CL:0000127) | part_of(CL:0000236)
N.B. no meaning is attached to the order of that the CL identifiers are listed in column 16.
from the GOA team:
1. Is there any timeline for inclusion of additional cross-references to other OBO ontologies/details on annotation targets?
2. In the proposed format for column 16, I can't see how I could indicate the biological processes which are dependent on a certain protein-protein interaction? Perhaps we need a new relationship such as 'required_for', to be used in conjunction with BP terms, which could be added into column 16 for the annotation to the 'protein binding' term? Example: The biological processes GO:0008284, GO:0050870, GO:0022409, occur when Q9P1W8 interacts with Q08722 (PMID:15383453).
3. We were confused on how to use 'has_input' and 'has_output'. There is not much information on these two relationships on this wiki page. In particular could we have examples describing the use of the 'has_output' relationship?
The idea here is that a process or function must have one or more participants - these are physical objects such as ions, molecules, proteins, RNAs, cell components, organs, etc. Participants can play different roles in a process, such as input, output or catalyst. The relation hierarchy is:
all participants must be present at some point during the process. If a participant is present at the beginning of the process, and it is changed in some way then it is an input. If it is present at the end, and has been changed in some way by the process, then it is an output.
This is very similar to [Reactome]
The meanings become more specific when paired with a biological process.
- biosynthesis - the output is what is made from simpler parts during the process
- catabolism - the input is what is broken down during the process
However, this is not always clear cut. Consider the case of binding to a protein such as importin. Is this input or output? In the Reactome model, the input would be an importin in the 'unbound' state and the output would be importin in the 'bound' state. But this is harder to state within the confines of the GO model. What we need to do here is work out the core types of function and process we wish to use in compositions and come up with clear guidelines. For the binding case I suggest usage of has_input
Note that in the majority of cases it is not wrong to state has_participant, any more than it is wrong to annotate higher up the GO DAG, this just doesn't communicate as much
4. What relationship should be used if we would like to indicate the target of a certain molecular function or biological process? For instance:
The target of protein Q9BRA2's protein-disulfide reductase activity is P63167
Q9BRA2 TXNDC17 GO:0047134 protein-disulfide reductase activity PMID:18579519 IDA F Thioredoxin domain-containing protein 17 TXNDC17|TXNL5|IPI00646689|TXD17_HUMAN protein taxon:9606 20080627 UniProtKB
5. Should there be annotation guidelines requiring that when a column 16 is used to link a GO MF to a CC term, it needs to be complemented by an annotation using the same PMID for the corresponding CC term?
6. Is it correct to assume that when a protein is the target of some transcription activity, we should indicate the target in column 16 in both the annotation to the process terms (e.g. positive regulation of transcription) and molecular function terms (e.g. transcription factor activity)?