Electronically curated flag

From GO Wiki
Revision as of 06:24, 12 April 2019 by Pascale (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Kara has proposed a new flag in the GAF to indicate whether a curation was reviewed by a warm-blooded curator or inferred purely electronically.

Kara's proposal:

...
But, I really think that we are going to continue to have problems with ISS (and other codes) vs. IEA as long as we continue to capture
the annotation/curation method (IEA) in the same "bit" as  the experimental method (the rest of the evidence codes).   The suggestion I made
 previously is to keep all the other evidence codes but remove IEA. We'd then capture the annotation method (manual vs. automated/electronic)
 separately.    So, a given annotation would have one of the existing experimental evidence codes.  Then, in addition, it would be flagged as
 electronically curated or manually curated.  Again, this will cause a little bit of anguish to switch to this, but it shouldn't be too bad,
 and I think it'd be well worth it because some problem related to this issue seems to come up at least quarterly.

To make the change, we'd have to:

1 convert the backlog:  should be easy.  Everything with IEA currently would get ISS with the new "electronically curated" flag.
2 create a place (in files, in databases) to capture the curation method.  Could this simply be a new qualifier?

Reponse from Chris:

Adding a new flag to the GAF is logically equivalent to creating cross-product terms in ECO.

i.e. given a list of core evidence codes E = {E1, E2, E3, ..., En}

And a binary evidence qualifier C= {Y,N} (curated/experimental)

It makes absolutely no logical difference whether we:

(a)

Have two places in the GAF for both E and C

or

(b)

Create terms in ECO for E x C = {E1Y, E1N, E2Y, E2N, ...}
and continue to work with a single place in the GAF for the evidence code.

Now there are pragmatic considerations which favour one approach over the other, For example (b) involves the least changes to existing
 databases and software. However, (a) makes it easy to filter based on curated vs experimental without an ECO lookup. (b) also has the
 advantage of allowing us to explicit disqualify certain meaningless combinations.

But these pragmatic considerations are secondary to the central issue here: adding or overloading a new or existing GAF column solves nothing
 which cannot be already solved by within ECO.


In practical terms, here is how this might look, capturing the spirit of Kara's proposal:

IEA would be the super-type for all non-curator-examined inferences. We can create subtypes, as required, by combining this with the core evidence type. We can assign a code based on the combination.

For example, IEA-ISM would be the code for an unvetted inference using a sequence model. It would be a sub-type of IEA, and would stand in some other relation (TBD) to ISM.

Annotating to IEA-ISM would be equivalent to annotating to ISM and to "automated" in Kara's 2-place scheme, below.

The ISM code would continue to retain it's current meaning. Annotations to ISM are equivalent to annotations to "manual" and "ISM" in Kara's scheme.

The advantage here is that nothing needs retrofitted. No GAFs need changed. Continued use of "IEA" is absolutely fine. At some determined point, curators will be allowed to use the new subtypes of IEA. However, this is *optional*.

Software that currently provides IEA filtering will have to be fixed before the date at which the new IEA subtypes are allowed.