Manager Call 2020-05-27
- Agenda: Suzi
- Minutes: Seth
- Present: David, Kimberly, Huaiyu, Suzi, Laurent, Judy, Chris, Pascale, PaulT, Suzi, Seth
Patrick says the entries are still not loaded - what is blocking ?
Chris: GPI is still problematic--duplicate entries. Maybe just PRO? But then data flow hard. And on and on, still not resolved. NEO load. Just picking one should be fine.
Chris: let’s make a decision about the duplicates here today.
Pascale: This is PI-level
David: PRO is used by Reactome and MGI
Judy: Let’s push on using PRO IDs
Pascale: Just tell Patrick to annotate to PRO?
PaulT: Just ask them for the level of granularity--technically doesn’t have to be PRO. Can’t be mandated, but… Talk to Alex/Alan.
Chris: For the directors call then.
Load the 142 Panther species in NEO and AmiGO
- Discussion from the multiorg group.
- We had proposed loading all reviewed entries for bacteria and viruses, see https://github.com/geneontology/neo/issues/49, but the 35 bacterial species (+ their substrains) are probably sufficient for the needs of bacterial annotation.
- Other species could be added if needed, to be discussed for each species and also with Panther.
Pascale: Talked about this on multi-organism. Right now, PANTHER has one E. coli, but eight are being annotated. So, plus species subspecies/strains.
Chris: “GO reference genomes”
Laurent: What about all species in GO over 1000 annotation
Pascale: should at least check. And we need to add viruses
Laurent: Do we want more feedback? Ask Pasteur?
Judy: Start with this set, then go ask.
PaulT: What genomes will have experimental studies?
Pascale: Can you send the list of bacterial species?
Huaiyu: It changes build to build. Sometimes replace.
Chris: TSV or YAML.
Pascale: I’ll take care of it.
Chris: Still need to schedule pipeline changes.
Pascale: Alex is ready to go.
Chris: Just SwissProt? Most in QfO set? For majority, in reference. What about others?
PaulT: Ask for reference proteome. Have them for all species.
Chris: What about our super-set?
PaulT: Should be available in reference proteome. The pan-genome is still an open question.
Chris: GO can follow PANTHER and QfO.
- To decide whether we need to hold a meeting with Michelle, and how soon that will be, we need to know
- Are we using ECO anywhere ? For example it shows up in AmiGO
- How do we WANT to use it ? Do we want to allow the full ECO, a subset, or just the current 3-letter ones ?
Pascale: From convo with Kimberly and David. Align GO three-letter codes and ECO.
Chris: We /are/ using it--it’s in GPAD. The IEA issue is important and outstanding. This comes up in filtering. This is important.
Kimberly: Also, we’ve instructed curators to stick to the three-letter codes. It would be nice to allow selection of ECO to allow more granular. The other question is: do we continue to tell them to stick to the three-letter or let them select more and start cleaning up ECO? Two/three day thing?
Pascale: There may be more problems when we start looking.
Chris: Our users probably don’t care too much. Would we find it more useful?
Judy: Really, isn’t this just a hint for the reference? Direct vs. indirect. The three-letter codes are really just supposed to be a heads-up.
Pascale: Curators, in my experience, found more granularity easier after getting used to it. Could see that it could go the other way.
PaulT: SynGO has been working with ECO to get better terms.
Kimberly: Given that, at least the SynGO subset of ECO.
Pascale: Will follow up with Michelle.
David: GO slice of ECO?
Kimberly: Short-term, IEA issue. Defer decision on GO subset until later?
David: We can use full ECO. Varies by curator.
Chris: Should be part of GO style guide--most useful if uniform.
Judy: So, what do we do?
Chris: This is already in AmiGO
Kimberly: Get IEAs fixed, will ask Karen about ones that don’t map.