13 March 2024 PAINT Conference Call
Present
Marc, Pascale, Dustin, Anushya, Paul, Huaiyu
Agenda
Taxon Constraint Curation
We are correcting the taxon constraint violations in the current PAINT data using this Google sheet file. The PAINT tool implemented a taxon constraint check a few years ago, so it is highly unlikely that TCVs are introduced during the curation. There two likely sources for the TCVs. First, these are really old annotations. Second, new TCs were added in the ontology after our last review.
The violations occur in the following three main type of situations.
- The IBD is annotated to an ancestral node older than the "only in" taxon, for example, "nucleus" is annotated to Archaea-Eukarypta. These annotations were usually generated before the initial load of paint data to the database on 2/28/2017.
- The propagation passes through an internal "never in" taxon constraint. There are quite a few terms that have "never in" relationships to Fungi or Pombe. Many of these "never in" TC need to be reviewed.
- The propagation passes through a horizontal transfer event. This accounts roughly a third of the violations.
How should we more efficiently handle this in the future when new TCs are added to the ontology?
Discussion:
- Review "never in" taxon constraints.
- Some of them can be difficult.
- Ask Jim to gather the information for us to review.
- May need a jamboree, but we will first review it to see the scope of the work.
- The HGT issue was discussed
- Any really bad genome that caused the unreliability of HGT. We review Dustin's file of all HGT events (in 3/5/2024 email).
- Block propagation through the HGT nodes. The question is whether all annotations should be blocked or only those in certain aspects, e.g, cellular components.
"Do not annotate" terms
In PAINT, there are annotations to terms marked as "do not annotate", e.g., "transport". We should:
- prevent curators from annotating IBDs using these terms;
- remove the existing IBDs with these terms.
Discussion:
- Add the following "do not annotate" list to the PAINT tool to not display these terms in the annotation matrix, so that they won't be used for the IBD annotation: current.geneontology.org/ontology/subsets/gocheck_do_not_annotate.tsv
- It is trickier to remove them, because they need to be replaced by a different term.
- Can this be done with an automated process?
- The replacement term needs to have experimental evidence. If not, the annotation is lost.
- Probably gather all the annotations with these terms, organized in family ID and PTN IDs, and see how much work is involved.
GO meeting presentation
- Are we going to present?
- If so, what topic?
Did not discuss. Can be done offline.