GO Managers, Weds. September 24th, 2008 8 a.m. PDT, 9 a.m. MDT, 10 a.m. CDT, 11 a.m. EDT, 4p.m. BST
Participants: Michael Ashburner, David Hill, Chris Mungall, Jane Lomax, Midori Harris, Suzanna Lewis, Mike Cherry, Judy Blake, Jennifer Deegan.
Action items from September 10
- All managers: Each manager should put on the GOC agenda highlights/progress report, and what things need to be discussed with the GOC.
- All managers: Each manager should put on the SAB agenda highlights/progress report, and what things need to be discussed with the SAB.
Action: Chris and Jane do these for their groups.
- Judy, Suzi: Determine who will take minutes at the GOC and SAB meetings.
Action: Carried over.
- Judy: contact Michelle and Karen (also Harold?) to see if they can maintain ECO.
Done: Michelle is thinking about it.
- Jennifer: make sure the minutes from the Salt Lake City GO Consortium Meeting get online Done: Minutes checked and handed over to Amelia.
- Jennifer: pass on annotation quality control trigger file to Albert Vilella in Ensembl for testing. Done.
- Jennifer: Send supplementary data from Paul Dobson drug transporter review to Harold: Done (Also sent to Victoria Petri and Emily Dimmer.)
New gene association file columns
Mike: Update managers on plans to support (or not) 16-column file after 17-column files become available.
Discussion of several points:
Will all groups be able to switch at once?
- It would be easier for annotation groups if they were not all forced to switch at once.
- If we do not force everybody to switch then people may not switch for a very long time. Perhaps it would be better to have a strict deadline.
Can Mike just autopopulate the column that is not mandatory?
One of the new columns is not mandatory and so if people send in annotation files with that column missing, then Mike can just add an empty column, and that will be fine.
Can Mike autopopulate the column that is to contain the gene product identifier?
Background: Currently we have a column that contains the gene identifier, but this new column is to contain the gene product identifier. In cases where groups are annotating a specific protein isoform, the id in the new column will not be the same as in the more general gene id column.
- Could Mike just copy the gene id to the gene product id column? This would not be strictly correct but it would mean there would be something in the column and people could just improve on that. Mike would be happy to do this if it is thought best.
- On the other hand it might be better just to leave the column null and have databases fill it in as they are able to find the correct id. Since this is not a mandatory column there was a slight bias of opinion towards this approach.
- What kinds of ids are we talking about? It will mostly be UniProt ids of the form Pxxxxx-n where Pxxxxx refers to the gene and the -n indicates which protein isoform we are talking about.
- Maybe Mike could have a checking script that shows when the -n suffix is missing. The absence of this suffix indicates that we are not sure which isoform is being discussed.
- Reference genome groups are already making the gene2protein file so maybe this will help with filling in the new column.
- One issue with annotating gene products is in getting from protein variants to gene sequence. How do we find UniProt protein sequence from nucleic acid sequence in a paper? Can be very difficult. It is suggested that if you don't know which isoform then just use gene id (and leave gene product column as null?)
Action: A mail on these ids was sent today by Henning to the reactome list. Suzi to forward to managers’ list.
IntAct has the same problem. The rough strategy is: - we have only one "protein" class. - if there are isoforms, each gets it's own entry. The "Pxxxxx-1" entry will be used as the "ReferenceGeneProduct". - each isoform entry has an attribute "isoform-parent" with the appropriate UniProt AC. So IntAct points from the isoform to the (one) parent, rather than from the parent to the (multiple) isoforms. - we knowingly and with a slightly bad conscience accept that we assume the longest isoform parent rather than a "ReferenceGeneProduct", as discussed here, if no isoform identifier is given. However, this is more or less what UniProt is doing, so we are in good company.
Action: This topic needs to be discussed with the annotation groups.
The electron transport group have requested specific feedback on who wants the mf-bp links and why. We feel we could tailor our plans better if we had a clear idea of who the user community would be, and what the use case is.
Why do we want function-process links?
- JGI want it.
- General feeling that it would be better.
- Useful for co-annotation. So for example, is someone annotates to one of the functions in photosynthesis then the process term is automatically added. Dopamine also given as an example. This is one of the main reasons, and this kind of gap filling is especially useful for bacterial annotation.
- Good for clustering gene products under specific processes. [Ed. Can David expand on this?]
Action item: Suzi to produce a clearer list of use cases for this to be presented at the consortium meeting.
Photosynthesis test case
Photosynthesis is proving difficult because the functions mostly do not have E.C. numbers and because there are very few annotations. The reason for the shortage of annotations is that the research was done in spinach and cyanobacteria and we do not have manual annotations for these. We had hoped that we might find the functions and processes that connect by looking at the annotations. Also in text books the process is usually shown as a collection of complexes rather than as a collection of functions, so it is not so easy to figure out the functions. We are now going on to look at other resources like Metacyc.
E-mail sent later: Chris has done some mining of reactome that we can look at too: This page has a table showing all reactome-sourced process->function links: http://wiki.geneontology.org/index.php/Mining_Process_Function_Links_from_Reactome There is also a link to an obo file that can be loaded in conjunction with the main GO file Looks like a good source of low hanging fruit. We should go through this and select the links that fulfil the definition of has_part. This isn't trivial, as the reactome pathway is often more specific than the GO process. All of the function-process links are collected under one wiki category: http://wiki.geneontology.org/index.php/Category:Function-Process
General point on implementation:
Chris: We are getting bogged down in detailed examples. We want to try to get broad coverage and not worry about difficult processes.
Jennifer: We really thought that photosynthesis would be one of the easy processes. In this case it may be that processes that look like low hanging fruit aren’t as easy on closer examination.
David: I often see process-function links that could be made when browsing the ontologies. Maybe we should just start that way.
Jennifer: Perhaps it will be easier to mine things where the function process co-annotation has already been done.
Moving beyond is_a and part_of is going to be a big leap for people.
We should make these discussions more accessible. The photosynthesis discussion should be moved to the f-p list and the electron transport people should all get signed up. Action: Jennifer to change the discussion over to this list and inform the GO list.
Next Consortium meeting.
Jane: sent email about possible GO meeting in Berlin. We will look into the costing and bring it on this call in a couple of weeks. Date of biocurator meeting is April 16-19. Maybe do GO meeting the couple of days before that.
Summary of action items
Chris and Jane to add agenda items for software and advocacy.
Judy and Suzi to think about who will take minutes at the next meeting.
Suzi: A mail on these UniProt ids was sent today by Henning to the reactome list. Suzi to forward to managers’ list. (Note: she wrote to say that this had been done, but the email did not arrive. Please send again.)
Pascale: This new annotation column topic needs to be discussed with the annotation groups.
Suzi to produce a clearer list of use cases for function-process links, to be presented at the consortium meeting.
Jennifer: to change the photosynthesis discussion over to this list and inform the GO list.
Jane Send rough numbers on cost of Berlin GO Consortium meeting.