QCQA discussion GOC meeting Cambridge 2017-10
Food for thought for QC
These are only thoughts, and should not be considered as an official document. This document only contains ideas of what could be useful for improving QC in GO. Last but not least, it only reflects my opinion (Sylvain).
Centralized documentation containing annotation guidelines is important. Ideally curation tools (protein2GO and other tools used by curators) should contain a link to documentation. I know there is already a lot that has been done, but centralization would be needed. Actually, this part has been nicely described by Pascale this morning. We can therefore consider that we are on good tracks
Training is one if not the essential step. We need to train curator to new annotation methods (GO-CAM) or new guidelines (extensions for Cellular components for example). For the moment, these announcements are made during GO call, but this is not sufficient anymore in my opinion. Ideally, there would be a bunch of senior curators in different places (Pascale for SIB, Val for UK, Kimberly and/or someone from MGI for US) that would be in charge to:
- Train curators to new tools and annotation rules
- Check curators annotation during the first weeks (or maybe longer when required) to make sure curators integrated new annotation tools and new annotation guidelines
The example of SynGO is very interesting to this extent. If real training procedures, with dedicated curators (centralized QC) to this task, it might have helped. As start, I think a small team of senior QC curators would be fine (4-5 people maximum).
The logical step following training. Every time new annotation guidelines are released and training made, annotations should be checked in order to ensure the new annotation policy is well understood and acquired. Random checking is also very useful. If I take my example, I curate a lot and frequently encounter annotations that I consider as problematic (such as downstream effects, over-annotation, concepts that are mixed or not well understood). If I take the example of Swiss-Prot, I would say every curator has its assets and liabilities. This is very useful to identify curators that would need additional training and/or things to improve. Frequently curators make the same type of errors (typos in my case) and it is useful to try to make them aware of part that could be improved in their curation.
Mechanisms to identify annotation errors
There are different mechanisms to identify annotations errors:
- PAINT is a great tool to identify annotation inconsistencies.
- PomBase developed mechanisms to identify annotation errors/issues: https://www.slideshare.net/ValerieWood/pombase-conventions-for-improving-annotation-depth-breadth-consistency-and-accuracy
- Over-annotation is clearly an issue: we might develop mechanisms to identify papers with more than 50 annotations and check them.
- Random identification (see above)
Mechanisms to correct errors and inform curators
A quite sensitive part if I take my experience. Curators don’t like being checked and corrected. I however think that a step beyond ‘disputes’ would be needed. Ideally, when one QC curator identifies annotation errors, he should inform the curator via a simple but centralized mechanism.
- If curators and QC people agrees everything is fine.
- When senior curator and curator disagree, the annotation case could be discussed with other senior curator in a call or by e-mail.
- In case of no response, senior curators should have the ability to update/obsolete/delete any annotation. IMPLEMENTED IN PRINCIPLE
Mechanisms to ‘obsolete’ annotations
There are many examples of contradictory or erroneous results in literature (SIRT5, JMJD6 in human). Many of these papers have been annotated by GO and it would be very useful to ‘obsolete’ without delete annotations that are thought to be incorrect. I would not be in favor to delete these annotations for two reasons:
- To show that this information has not been overlooked
- Knowledge evolves and it might be possible that things go back and forth and we cannot exclude that obsolete annotations might be resurrected at some point.
- We need better documentation for new curators
- Better training when there are new curators and new curation guidelines/changes in the ontology/relations/ new tools (Noctua, etc)
- Training can be by presentation, and/or by email
- Look at GO_rules to see what checks are already there and think about new rules
- Look at number of annotation per paper
- Review curator work (ask Tony to add ORCID in EBI GAF)
- A combination of number of papers, annotations, ECO, information content
- Strategies to identify problems
- Try to engage researchers more?
- Strategies to help curators
- Have a ‘easy way’ to get to PAINT/AGR data for orthologs from the curation interface, so that people can look at how the ortholog was annotated?
- Provide a list of curators expert domains
- Mechanisms to efficiently identify annotation errors that have impact
- VAL to go through the Biocuration talk
- VAL list of common errors
- VAL list of pombase ‘do not annotate’ terms
- ERIC Look at GO_rule outputs to see what rules are being broken
- Which email list to use to write to curators? Go-consortium? Go-discuss?
- GO PRINCIPLE: See Pascale’s presentation MOVED TO WIKI for now
- GO is not a legacy database: we would like to represent current knowledge
- Implement QC committee to review disagreements with curators when we dispute annotations
- Look at GO_rules to see what checks are already there and think about new rules ONGOING
- What error rate are we aiming for?
The aim is 100%. This varies a lot by term, etc “Errors” in which the general/high level branch is OK are are not considered errors. We aim to annotate in the correct branch Kimberly proposes 90 accuracy%
- Tony to add ORCID in EBI GPAD
- Improve mechanism for disputes and moving disputes forward if people do not respond
- Try to engage researchers more? See how SGD does it
- Provide a list of curators expert domains
- Mechanisms to correct errors and inform curators: Prioritization: GitHub
- Create report for annotations to a term, with and without children, with and without regulates, group by group, groups by evidence code, etc (to be spec’ed out) Probably can be done with AmiGO
- GO QC team should have edit rights on all GO annotations in ANY team.
- Set up a procedure: for g they need to reply by certain date, ect, review committee, domain experts. Etc TO BE REVIEWED LATER
- Implement a mechanism that GOC database can post-process annotations to flag them (or whatever mechanism) to remove them from our dataset