Initial report 13th April 2006
Compiled by Jennifer Clark, GO Editorial Office, EBI.
With contributions by Evelyn Camon, GOA, EBI.
This report outlines the plans for the GO Outreach Working Group in response to the request by the GO PIs, outlined in a presentation slide by Suzanna Lewis during a talk at the GO Consortium Meeting at St Croix in March 2006.
1) What is the group's purpose?
To get annotation of maximal feasible quality across all species.
2) What makes this group necessary and unique?
The group is necessary since GO endeavours to bring together information about all research species. We have a number of excellent databases annotating model species but we need more manpower to help complete the annotation of those species. There are also many species that are not being annotated at all yet and so we need to find annotation groups for them.
One argument against outreaching to all species groups could be that we should just annotate the reference genomes fully and that the annotation of other species could just be done electronically. However, many of the less well funded research species are used to study different biological phenomena/processes. Manual annotation of these processes is essential to capture the full range of gene function. It will also allow us to extrapolate information back to the uncharacterised gene products of the reference genomes.
3) What is the lifespan?
There will continue to be opportunities in this area until all gene products with functional information are fully manually annotated. At the current time we are only planning for the next funding period which is anticipated to be five years.
B) Group Leader
Who tracks progress and makes sure that milestones are met?
Jennifer Clark, GO editorial Office, Cambridge.
1) What are the key deliverables of this group?
I think that the group needs to form properly and discuss general plans before we can give long term deliverables. However, in the first instance I can give deliverables on how the group will form and how we will start our discussions. I have also suggested here some possible ongoing deliverables. These may change over time as a consequence of discussion and experience.
a)Within 1-2 weeks of report date:
Establishment of group and communication mechanisms: Invitation for people to join the group (e-mail). Initial conference call/webinar to discuss what the group will do. Set up of mailing list to continue discussion. Get in touch with the other working groups and figure out where our roles overlap.
b) Within 2-3 weeks of report date:
Meeting with Daniel Barrell, Evelyn Camon, Emily Dimmer and Jennifer Clark to discuss information management and how the current text file and database system could be improved. (I think I'm meant to work on this with Chris Mungall but that's pretty complicated as he's far away and our working hours do not overlap. Evelyn and I thought it might be better to ask Dan for help in the first instance.)
c) Within 1 month of report:
Work out metrics for regularly measuring progress. The progress report with these metrics will be sent on a monthly basis thereafter. There will be a need for a regular confidential monthly report to the PIs and also a short public newsletter to the GO and/or GO-Friends list to keep people posted on what we're doing and how they can help. The public report can be included in the consortium newsletter.
Some possible general long-term deliverables
Within six months:
a) Either get the information handling system up and running or submit a proposal to get a student to come and set it up next summer.
b) Work with the AmiGO group to develop a system whereby each time a gene product search produces no result, a page will be displayed directing the user to information showing what communities need to do to get their gene products annotated. This should include the e-mail address for the outreach group.
c) Figure out who is responsible for improving the teaching resources archive and discuss whether we could have a clearly delineated and planned set of flash tutorials with voice over that are regularly maintained. Find out which group is responsible for doing this and start planning and doing.
a) Find out how the annotation problem could be solved partly just by better use of information that we already have. Talk to the relevant groups about this:
i) IEA accessibility ii) Proteome set and species set downloads iii) Including all annotation object names and abbreviations for searching
Find out what the shortfalls will be after the information we already have is being properly exploited and prioritise appropriately :
i) Ask the tools people to tell us which gene products people most regularly search for and fail to find. ii) Ask user communities what they need (e.g. farm animal people needed a cow IEA set download and this was quick to set up.)
b) Figure out SOPs for getting new groups up and running with manual or electronic annotation.
The SOPs will enable groups with different amounts of time and funding to annotate to the best possible level.
We could have SOPs for teaching:
- Manual annotation by a database.
- Manual annotation by a community.
- Manual annotation by a lab group.
- Manual annotation by an individual.
- Electronic annotation of species closely related to a very well manually annotated reference genome (e.g. organisms closely related to yeast).
- Electronic annotation of species not closely related to a well annotated reference genome.
- Electronic annotation of ESTs
- Bulk annotation from large scale experiments.
A major responsibility for the outreach group will be mentoring the groups that wish to start annotation. The training in manual annotation should include an SOP on how to judge when a new annotator is experienced enough to start training others. This is important as we are already reaching capacity in mentoring and new mentors will soon be needed.
c) Identify Biology conferences where people can go and give workshops along the lines of those at MGED and PAG. For example the top neurobiology and immunology meetings should include GO workshops. Curators with specific biology backgrounds could go to the best conferences in their specific domain of interest and outreach to their own communities.
d) Collect and distribute information This group should collect information about all the groups that are not currently annotating and why. The information should be made available to people within the consortium who are in a position to solve the problems. This section of the work is the one that I have been most heavily involved in so far. It is mostly about knowing the politics of all the different communities, and understanding what can and cannot be achieved within the bounds of the various political situations and funding levels.
The group will also maintain information on which groups are annotating, on which groups are beginning annotation, and what stage they are all at.
e) The outreach working group will take a keen interest in the development of a general manual annotation submission tool. We will liaise with the software working group on this.
2) What criteria are used to set priorities?
Groups that are interested in doing manual annotation are currently given high priority.
Up to now we have been giving particularly high priority to groups that are well funded (or potentially well funded) as they have the best chance of getting a lot of annotation done. I think this will continue.
I have asked the GOLD online database if they can collect information on which groups have funding for manual annotation since this will help us target our outreach. They are considering this but have not yet made a firm decision. (They currently display information on the groups' funding sources but not on what they are funded to do.)
If we are trying to support all groups, and help them get the best level of annotation they can manage, then we will have to devote some time to working out good electronic annotation plans. During my outreach work so far I have often contacted new groups and been told that they wish to contribute IEA annotation. Therefore, although my job was to seek out manual annotation groups, I was very much involved in the discussion of IEA annotation. I think that the outreach group will necessarily also be heavily involved in this discussion.
In the grant it is proposed that we should address the IEA question by setting up a vehicle for discussion and debate of IEA annotation mechanisms. I would like to know if co-ordination of this is part of the role of the outreach group. The results of the discussion should be included in the documentation and training materials relating to IEA annotation, and I anticipate that the outreach group will be involved in producing these.
There are some specific priorities stated in the GO grant proposal relating to different taxa, and to a number of existing database resources. The outreach working group will follow these priorities.
What are the criteria for membership, and what role does each member play?
I think everybody is involved to some extent but there are some key people:
The PIs: They are aware of new funding initiatives and of the more subtle political influences that the curators are not involved in so much. They can influence groups in a way that we cannot. Because of this it is important that there is close communication between the PIs and the outreach group. There are specific tasks that are better dealt with by PIs or by curators and so communication will be very active between these two groups. Michael Ashburner: Particularly targeting prokaryote groups. Judy Blake: Dealing with difficult situations in farm animal community. Suzanna Lewis: NIAID groups? There may be other such specializations that I don't know about.
Jennifer Clark Identifies new avenues of investigation. Keeps track of progress. Passes on tasks, information and questions on to appropriate people. Watches deliverables and milestones. Co-ordinates group.
Karen Christie: Runs annotation courses in California Outreach contact with Rice community in Japan.
David Hill Mentor to new annotation groups (AgBase, PseudoCAP, NIAID?)
Evelyn Camon Mentor to new annotation groups (AgBase, HGNC, LifeDB, Roslin, Grape group, UCD.)
Rex Chisholm Head of reference genome working group.
Once the group starts I anticipate that the GO funded people in the annotation databases will be very involved.
Fiona McCarthy Outreach to farm animal community.
Dave Burt Outreaching the Chicken community and to a lesser extent the farm animal community.
The interactions of the working group with both of these individuals will be important in making SOPs for starting manual annotation as the groups are good test cases.
E) Meeting calendar
1) How often do they meet?
I think we may all need to meet once every couple of months by phone. 2) Through what communication channels?
- e-mail as required via new outreach mailing list.
- GO PIs as often as needed by GO-Top e-mail list + monthly report by e-mail.
- Jennifer Clark and GOA in person in the office.
- The farm animal mailing list is proving very useful for getting the chicken annotation grant proposal started and others (Cow group in Ireland) are now following the lead of the chicken group. There may be scope for more of these lists but they will only be requested as the need arises.
We will use conference calls as needed, and may progress to skype if we can get the technology working. I don't think we'll need to meet in person more often than consortium meetings. There will be other opportunities to meet in person during other meetings.
3) What are the milestones?
I don't have any of these yet.
F) Metrics of success
a) What are the group's objectives?
To get lots of new sources of annotation.
b) How is the group success evaluated?
Breadth of species, and quality of annotation. Once we work out how to measure quality of annotation we will present figures once a month.
Also the deliverables above will help test progress.
How does the group interact and share information with other key groups?
Jennifer Clark will liaise with other groups initially and group members will take over particular responsibilities for liaising as appropriate.
The main connections will be:
i) with the reference genomes group, which is going to a meet by phone. ii) with the community advocacy group which I think might be involved with producing teaching materials. Via Eurie by e-mail. iii) with the 'ontology', and 'driving ontology by annotation' groups, to inform where new annotating species may need new terms en masse soon. iv) possibly with computational architecture group to get help with information management systems, though it may be easier to ask people here, at least initially. We will also liaise with this group about the generic annotation tool. Communication with Chris Mungall via e-mail.
What are the key decisions? How are decisions made? What is the flow of choices?
Some of this will be in the annotation SOPs once we work them out.
A major decision for us will be how much time to spend on outreach and how much to spend on our other tasks. I understand that some guidance on this will come from the PIs in response to the monthly reports that we send on progress.
What tools might be used to make this more efficient?
We very much need to get the information handling system working so we can keep track of all the different groups. Archiving, and making an interface for the various group members will be a tricky issue with this because of cvs being public and the information being private.
We will need a mailing list for the outreach group, and it would be good if it could be archived but not public.
In the long term, a generic annotation submission system will be very helpful.
It would be good to decide on a system to make good tutorials. Evelyn suggested flash with voice over.
Once John Day-Richter sets up the technology to be used for the obo-edit webinars then that will be very helpful for us too.