Annotation consistency: Using IEP

From GO Wiki
Jump to navigation Jump to search

Group Members

(anyone can add themselves)

  • Pascale Gaudet, Emily Dimmer, Ruth Lovering, Stacia Engel, Val Wood, Varsha Khodiyar

1. Issue

Some people had annotated 'response to heat' by IDA for the heat shock protein; while what was measured was the level of transcript/protein. There are 38 genes annotated to 'response to heat shock' with IDA in the GO database. Those should be checked? (we'll discuss, see below).

There seem to be 2 wiki's here dealing with the a similar issue. I have added comments to the wiki for 'Annotation consistency: 'Response to' terms'[1].

GOC evidence code guidelines advise caution in the use of IEP.

There are only 2 examples in these guidelines of when IEP can be used (and many when it can't)

Examples where the IEP evidence code should be used:

1. genes upregulated during a stress condition may be annotated to the process of stress response (for example, heat shock proteins).

2. genes selectively expressed at specific developmental stages in specific organs may be annotated to xxx development

I disagree with the first example and think this should be listed as an example when IEP should not be used (see wiki mentioned above). However, perhaps the agreed resolution of the 'Annotation consistency: 'Response to' terms' will lead to an agreed resolution about whether IEP should be used for stress condition annotations.

I am not sure it is useful to provide a case where I have used IEP and am not sure this was right, but I think specific cases can help focus discussions. So I have looked at the few cases where I have used IEP and now wonder if these should be IC. However I prefer the option of using a direct evidence code. eg PMID: 12082081: 'During leukocyte adhesion in static or under flow conditions, VCAM-1, ICAM-1, and activated moesin and ezrin clustered in an endothelial actin-rich docking structure that anchored and partially embraced the leukocyte containing other cytoskeletal components such as alpha-actinin, vinculin, and VASP.'

Knowing that VCAM1 and ICAM1 are cell adhesion molecules, I associated the term 'GO:0007159 leukocyte adhesion' to these genes with an IEP code. These proteins are in the appropriate cellular compartment to participate in leukocyte adhesion. However I assumed that location in the IEP description referred to cellular compartment not just cell type location. Any thoughts on the appropriate use of IEP in this case, and whether location refers to cell type/stage or cellular component/compartment.

2. Proposed solution(s)

3. Comments/counter arguments


I have recently used IEP to annotate a set of genes identified to be involved in the core environmental stress response (CESR), which are upregulated in response to most stresses, heat, oxidative, DNA damaging agents etc to the term 'response to stress'. This seems to be a valid use of this term, although agree there are not many processes you could annotate using microarray data in isolation.

I don't think that the fact that some annotations come from micro array data precludes their use in subsequent analyses of microarray data.

Microarray analyses, as any other genome-wide analyses benefit from as comprehensive annotations of processes as possible. When analysing microarray data you are looking for enrichment or depletion of GO terms for sets of genes which are differentially regulated in your experiment. The source of the annotation doesn't matter, the important thing from the analyses perspective is that it is comprehensive and accurate as possible. If annotation derived from microarrays can be used to provide some information about biological process, which can help in subsequenct analyses, so much the better.

However, using sets of co-regulated genes from microarray experiments and inferring process because other sets of genes are co-regulated can be dangerous. Only few regulons are so tightly conserved for process (ribosomal RNAs and ribosome biogenesis proteins for example), but these regulons can also include some genes not involved in these processes. I would be wary of making annotations from 'functional predictions' based on these types of microarray experiments, without additional evidence. These are 'predictions' based on statistical analysis and will have an associated false positive rate.

We need to make a distinction between these two types of data:

i) Microarray experiments intended to show that a set of genes are involved in response to a certain condition (i.e. stress), or life cycle stage (i.e meiosis), and

ii) Function predictions made from downstream analysis of microarray datasets to identify sets of genes which are co-regulated and infer specific processes. I am not aware of this working well for yeast except for the cases mentioned above, and this is only a small percentage of the uses of microarray data.

For pombe there are between 50 and 100 microarray publications so far, and all are type i) none are type ii)

SGD have curated some analyses of type ii) and used RCA. I recently queried whether this should in fact be IEP, (and I think SGD are still discussing this...I can see now that here RCA would probably be better, for the reasons mentioned here).

Val [taken from a discussion on the evidence code e-mail list on 27-06-2008]

4. Proposed resolution

Back to Reference_Genome_Annotation_Project