Manager Call 2015-05-18
IDs in column 2 and 16
Progress on resolution still needed for what ID types we can use in AE for ‘regulation of transcription/translation/expression’ relation. After UCL/GOA discussion: Need to have motivation and impact detailed “Ruth can’t write on behalf of LEGO” before it be sent out to wider consortium. Need deadline (Ruth sent proposal a week ago)
Attendees: Paola, Melanie, Kimberly, David, Judy, Huaiyu, Suzi, Pascale (indicated by initials. P = Pascale not Paola)
Agenda: Kimberly for Moni; Minutes: Paola
[For sake of argument, column numbers here refer to GAFs. Col 2 is gene, col 16 is AE]
Background: Paul T articulated argument for mapping everything up to gene if possible, but he isn’t here today.
M (for Tony too): in the past we discussed having consistency btw col 2 + 16; not convinced by proposal; need to make motivation and impact on user clear. Ruth thinks main motivation for change is due to LEGO requirements but would like that to be confirmed.
D: was confused too. Didn’t think motivation was LEGO, but rather consistency of IDs across the 3 columns. But would need Paul to discuss.
M: why does consistency matter?
D: I *think* the questions were, but I’m unsure: what ID is legal for AE, and what does it represent? Are we going to distinguish between gene and gene product?
J: column 2 is being used for generic concept of gene, correct?
D: That’s where the discussion comes up. We need Paul T and DOS to clarify and discuss.
S: for PANTHER families, some IDs we can’t even link to, causing parsing problems downstream
J: Can see why UniProt IDs would work in some cases; makes for a confusing situation.
D: PIs need to decide and decree.
M: because it’s going to be a change, we need to circulate proposal, but we need to make it clear why we want to update that - some people don’t attend annotation calls
J: I thought column 2 was the object being curated
M: that’s what happens in GPAD - column 2 represents the gene
S: is that categorical? Are we not discussing how that is unclear?
J: I like ‘type’ column. Can’t see why Paul wanted everything to be gene in column 2.
D: Some curators annotating isoforms put gene in col 2 and protein isoform in col 16
S: that is weird
D: there has to be some way to connect a gene in col 2 to MOD-specific info
D: If you go to AmiGO and enter e.g. human betacatenin, it comes up as its gene symbol, and also returns isoforms
M: still unclear
S: me too. What problem are we trying to solve
M: for me, being able to use gene ID or protein ID works well because sometimes I annotate to the gene and sometimes to the protein
P: recollect there was ambiguity for transcription factors. Has to do with the object you’re trying to describe.
S: Don’t see a problem with what we have now
P: The issue is with guidelines or lack thereof
M: UniProt curators annotate to UniProt IDs. May use conversion tools to represent e.g. MGI identifiers. Whichever way we go it’s a big decision, we need to understand why we want/need to make it, explain it and document it
J: recently there’s been a change in the amount and type of data we’re curating. We need entry triggers (what type of ID goes here, can/can’t go here)
K: we will put the *appropriate* identifier based on what is being assayed in the paper. We used to do this at WB way back when. So we had mixed IDs in col 2. Users didn’t like it.
P: going to be difficult for users and for enrichment analysis
J: we could take that approach, that col 2 is the gene ID, and object being curated is in col 17
S: having heterogeneity is good to represent the biology; we lose info if we don’t use the isoform; there is a dependency between col 2 and 16 based on GO term used; we’re not going to solve that problem; real problem is what is the issue we’re trying to solve?
J: we ask for a place where the object being curated can be identified. If col 17 is always filled in, then col 2 can be used for enrichment analysis. Users would have to have a more complex understanding of GO. At the moment, we’re using the IDs interchangeably.
P: This is a different conversation: How can we be more stringent about the use of col 17?
K: col 16 is relevant to this, because I think Paul was suggesting not to necessarily use the whole spectrum of identifiers
S: How would that work?
K: We’d use MOD gene IDs and UniProt IDS would be meant to represent genes or proteins and we wouldn’t put more semantics than that into it.
D: If that’s the case, then PRO IDs are not allowed in col 16.
M: if we allow any type of ID in col 16, then we’re good
P: Why don’t we allow IDs based on the level we want to represent
M: we need strong proposal with pros and cons; need to be clear on what we want to do and why; what would be the impact on users; Paul needs to comment on it; then we need to circulate.
AI: will continue this discussion on next call, ask Paul to be on. Maybe make call open to non-managers e.g. Tony. Send out note? But we need proposals first. Kimberly will start a Google doc. Managers team will look at it, and then will open doc to others. Also, run proposal by some users we know. E.g. users of enrichment tool at MGI; Judy may provide names of a few people to contact.
J: Please email GO directors if any date proposed for GOC meeting in LA won’t work for you, PIs will probably decide today.
D: Kimberly and I are working on Noctua documentation. Please take a look. It’s linked from the Noctua home page. It’s in progress.