Meeting with Satnam Singh, 18th March 2009

Date: 18th March 2009
Participants: Jennifer Deegan (EMBL-EBI) and Satnam Singh (Microsoft Research, Cambridge, UK).
Location: Microsoft Research, Cambridge.

Purpose of visit

I visited Satnam as he is interested to find real world biology problems that he can try to help solve using his computer science research. He works on Concurrency and FPGAs

Discussion

I showed him a presentation on GO: \go\teaching_resources\presentations\2009-03_GOIntroForCS_jdeegan.ppt

and explained all about what we do. Then I showed him the gene_ontology_write.obo file in OBO-Edit 2 b56 and I ran the Rule Based Reasoner, which takes about 30 seconds on my computer. Satnam does research on how slow processes can be speeded up, so he was interested to see this. He also asked if we had considered involving CS researchers on machine learning to make OBO-Edit better at predicting which term a user is looking for and giving more help in finding it. Apparently there is also machine learning research at his institute, so they may be interested.

He pointed out rightly that FPGAs are not of so much use to us, as we would need special hardware to use make use of the results of his work. However, he said that he may be able to help speed up the reasoner by looking at multi-threading, concurrency and parallelism. The reasoner is currently single threaded. I had previously discussed this possibility with Chris Mungall and we had agreed that this would be an excellent thing, so Satnam and I worked from here on the assumption that this would be the best way to go.

Logistics of collaboration

We then thought about how the two groups could best work together and I explained the geographical distribution of the GO group, and the fact that Chris and Amina are based in Berkeley, and that I am in Cambridge. We figured that the communication issues should not be too bad even as far as Berkeley, and since he and I work and live close together we can easily have very good communication for software, logic and biology discussions.

We discussed the problems of Intellectual property and how an open source academic group might work with Microsoft. He agrees that any work in which there is doubt about who owns the resulting intellectual property can be difficult, and says we are right to be cautious. However, he was keen to find a solution that would work for both groups and that would enable us to carry out research together. He says that avoiding IP complications is also very beneficial for him. From experience he has found that if he takes on a collaboration where there is doubt about who will own the resulting IP, then the preparation stage can involve a lot of discussion with lawyers, which takes many months, and often leads to a project being dropped. He says it is much better if we work together in a way that avoids any possible confusion about who owns the resulting IP, and so avoids the involvement of lawyers. To do this, there are some general rules that we can follow:

He should never look at the OBO-Edit code, as that is our intellectual property. To have him look at it, or add to it, lawyers would have to discuss who the intellectual property would belong to in the long run, and that would be complicated. He says it is by far the best thing if he just doesn't look at the OBO-Edit code.

If he writes any code or creates any algorithms, then by the rules of his institute he is allowed to post them on a public website and have the work be open source, and be available for free use by groups like ours. If he did this then there would be no IP issues with our benefitting from his work.

As he is looking at working on good algorithms for the reasoner, he says the best thing would be for us to work together on very abstract algorithms expressed as logical rules, and implemented perhaps in Haskell or similar. He can post these on the website and have them be open source. Then we in the GO project can take the algorithms away and implement them in java in OBO-Edit ourselves. That way he will have no IP claim over our work, and there will be no complications in our working together.

Having considered all this we both thought that the collaboration was well worth pursuing, so we agreed that I would introduce him to Chris to discuss things further. I left him with a copy of the reasoner paper.

Summary of the proposed objectives:

For GO

The hope for us is that with use of concurrency and parallelism, the reasoner will run much faster and will be more useful for the ontology editors.

For Satnam

Satnam gets to have a real world problem on which he can use his experience in concurrency and parallelism and develop new ideas. He is also very interested to get a joint publication in a journal like BMC Bioinformatics if at all possible. I thought that sounded very feasible as we have recently had an ontology development paper published in BMC Medical Genomics

I am also going to arrange for Satnam to come to the EBI as an invited speaker so that we can see if there are any other slow computational process at the EBI or Sanger that he could work on.