2010 Stanford Meeting Minutes

From GO Wiki
Jump to: navigation, search


Meeting Participates


Tuesday, March 30 – Morning [Start - Break]

Communication and Process for the GOC [Suzi]

  1. GO vision
  2. Changes in Managers structure

Tuesday, March 30 – Morning [Break – Lunch]

Ref Genome Project and PAINT [Pascale and Kara]

Update on Annotation

    • Curation: About eleven protein families have been done with beta version of PAINT, primarily by Mike Livstone in Kara's group at Princeton.
    • References for PAINT annotations: One reference is available for each family's annotation.

Reference Genomes presentation (Pascale: need slides)

    • protein sets created by Dan B. Are being used for the curation of the PAINT trees
    • There has been a switch to more biologically-focused target selection.
    • Currently, most groups have annotated genes in the Lung branching morphogenesis; now the PAINT gene annotation starts
    • Peter: How many of the 39 families are unique to the lung branching process? Pascale: most are not unique, there overlaps with the heart list,however we will ensure that previously annotated targets are not resent to curators.
    • Next RefGen project: Heart development
    • Pascale: need to ensure that curators annotate comprehensively –for instance was there any focus on solely lung-relevant papers by curators? This needs looking into.
    • Mary: as there now is a focus on sets of gene families, it would be good to communicate with nomenclature efforts to establish whether organisms have resolved identity issues – would advance coordination with other communities.
    • Judy: identity issue is primary for GO annotation, although would be good to intersect with nomenclature issues
    • Pascale:any outstanding annotation issues need to be converted into documentation.
    • Tanya: wrt checking and correction of inconsistent annotations. Would only those annotations created after the change in guidelines need to follow these rules?
    • Emily: we need to have a consistent annotation set – it should not matter when annotations are made. Users of the annotation set will not be aware of the cut-off date – such dates hinder annotation consistency. However the amount of effort required to retrofit some annotation sets might sometimes need to play a role in the annotation policy discussion
    • Judy: need to continue to make annotation set better, need to bring along older annotation sets
    • Harold: need to generate user annotation documentation – to highlight to users how they should best apply the annotation sets. How to filter by IEAS
    • Suzi: need to be balanced in this approach, closure needs to be reached. Quality of annotations should not be harmed by lack of follow through. All members of GOC should understand and apply annotation decisions
      • ACTION: user documentation needs to be made, indicating how GO annotations can be filtered by users (e.g. alerting users to different annotation subsets IEAs/RCAs). Working group to be established; including Harold and Amelia.)
    • Jim: this is an outcome of RefGen project. Standards have to be applied quite aggressively. Retrofitting should be part of the RefGen job.
    • Pascale: it needs to be assumed that all participants in RefGen read guidelines and follow them – rules need to be applied consistently by all.
    • Emily: automated rule checking of annotation sets would support curators
    • Eurie: Panther families chosen for annotation targets– how does this fit with distribution over species?
    • Pascale: this hasn’t looked into. However it is more towards the lower number for the current RefGen set.
    • Julie: concerns about running projects in parallel – can put a large burden on mse/rat/human annotation groups – we should be annotating in parallel to improve consistency . So concern regarding to groups falling behind
    • Pascale: however currently only those genes that are really annotated in parallel are those from jamborees. While its true that the more data we have in annotations the better the propagations we can make, however even if we’re missing some exp erimental data, the group can still move through targets by looking at the PAINT tree at which close organisms have been annotated – then we could go back to MODs to request focused on annotation finding certain annotations.
    • Julie: one approach is to accept we’ll always be behind. However perhaps we could improve the RefGen group’s activities to reduce this fall back.
    • Suzi – agree. We need to make the annotation process more efficient - this is key to moving forward.
    • Tanya: PAINT is useful when taking spread of annotations to transfer to MODs and nonMODs. Would suggest that it might be best to concentrate on annotating those targets that appear in all 12 organisms. Could get most for money where most groups can contribute towards the annotation set. There are about 900 of these genes, so we have enough to keep us busy!
    • Pascale: problem is these highly conserved genes tend to be metabolic.
    • David: metabolic genes aren’t boring. Often there is a lot of data available that has not yet been curated.
    • Pascale: this might be why projects in parallel might be good.
    • Kara: tool at Princeton could help look at this spread of genes.
    • Judy: need broad coverage of annotations to support the predictive curators, also need RefGen curators to generate this data in the most efficient way. Need to work on how to priotize this effort.
    • Julie: one approach could be that we share the burden – could other groups help curate if, for example, the MGI curators if overburdened could be aided by other groups in curating.
    • Jim: perhaps some of the lists could be more focused to include prokaryotes
    • Emily: other curators need to be involved in selecting gene lists. Mammalian groups are currently involved in the focused annotation of certain processes, so they have a particular interest in suggesting targets this area. There should be wider participation in the generation of these lists.
    • Kara: agreed – need contributions from a wider set of member groups
    • Suzi – groups should offer proposals for targeted annotation by the RefGen group, these should be presented to Pascale and Kara so that they could select good projects.
    • Judy: if thinking of parallel projects: could infrastructure handle this mode of curation?
    • Pascale: this needs to be explored.
      • Action: Reference Genome group needs to explore ways of improving the target list selection method to ensure a greater participation by all curation groups: Pascale and Kara.
    • Pascale: ontology development efforts need to be in sync.
    • Eurie: has there been significant involvement of ontology development that has occurred alongside with RefGen curation?
    • Suzi: yes, however this type of focused effort has only really started. But initial indicators are good.
    • Ruth: is there any under-utlized group? Could they identify any genes that are under-annotated that have been added to the RefGen target list ? This would help to catch up with the backlog.
    • Midori: wrt ontology devel. Will probably see more collaborations between ontology devel and annotation as targets based on common processes continue –currently ontology requests do seem to vary depending on the targets picked –we do see quite a few SF items for certain targets
    • Rama: perhaps at end of year, we could take a break to focus on annotation and evidence code issues – and spend time on improving consistency of annotations. Perhaps spend 1 /2 months focusing on this area?
    • Pascale: this consistency effort will also be supported by GO camp.
    • Pascale: PAINT will help generate more complete annotation sets, to where annotation might be missing. Far better than using the Google spreadsheets! Sven has been creating an annotation status report wrt to annotation status of RefGen targets in different Panther families. (see slides). This will help decide PAINT curation prioritises. This AmiGO tool is mostly for internal purposes.
    • Judy/Michael: we like this very much.
    • Kara: will add number of gene members for a Panther family to this view as well.
    • Judy: how are PAINT annotations getting into MODs?
    • Pascale: most groups can integrate external GAF files. These files will be made available from a central GOC directory. In addition, curators will want to check these annotations – so we will be making tools available to support curators to visualize the PAINT annotations (Mary Dolan, AmiGO efforts underway)
    • Suzi – there now 7 protein families with a PAINT GAF – these files are being reviewed by Pascale and Kara (curated by Mike Livstone), these files will be made available for MODs. Anyone interested in having a copy of PAINT should talk with Ed.
    • Judy: should be a high priority for groups for these annotations to be propagated.
      • Action: All annotation groups to review PAINT annotation sets with a view to integrating them into their association files. To do: All groups

PAINT update (need slides from Paul)

    • Where annotations not consistent, can be because links missing from ontology. Can be used to ensure term has most specific parent possible.
    • Propagating NOT annotations - experimental evidence required for all NOT annotations. Propagate on a case-by-case basis? Need to determine at which point in evolutionary history the function has been lost. Also an issue with isoforms where the second isoform does NOT have the same function/process as the first.
    • Bottleneck - all MODs getting through manual annotation for each gene.
    • Any way of publicizing results from PAINT? Publications, stuff on website, captured in AmiGO?
    • Project-based approach increases efficiency. Allows synergy with other groups, tighter integration and co-ordination.
    • Can some of this be automated? e.g. looking at active site residue for enzymes to determine loss of function. Can also use the taxonomic restrictions for the ontologies. Also use gene-structure/model calculations - already in use.
  1. PAINT demo (link?)
    • Uses a grid display to show where annotations missing, and need to change ontology/go back to MOD. Displays multiple information simultaneously, like OBO-Edit. Grid filled in as annotations completed.
    • Should be straightforward for PAINT to pull in taxon rules
    • In tabular display, more GO terms grouped, more specific to less specific within groups.
    • In display, NOT annotations propagated up tree as long as no contradictory positive annotations (visual clue only)
    • Are there cases where you can't automatically triangulate annotation up tree? Yes, there are examples of this - but this can be prevented using taxonomic restricdtions
      • Action: We should figure out a mechanism by which taxonomic restrictions derived from PAINT could be fed back into the ontology taxon restrictions file. to do: Paul and Jane
    • Multiple sequence alignment view - for double-checking with tree
  2. Future Developments for PAINT
    • Semantic zooming - a cartoon view to visualize how domains have been shuffled/duplicated. Talking to InterPro about developing cartoons.
    • Use of GO_REFs for PAINT. All references would need to imported into MODs
    • Eurie: issue of generating internal GO_REFs for MODs
    • Chris/Judy: this is an internal issue that MODs need to cope with.
    • Pascale: can show list of ids.

Tuesday, March 30 Afternoon [After lunch - Break]

AmiGO and GO database updates/report (Seth and Chris)

  1. New search features [1]
  2. PAINT-style displays in AmiGO and family-based annotation reports [2]
  3. Improved tree view [3]
  4. loading the database with full gene product sets for reference genome groups
  5. plans for displaying col 16
  6. using reasoner as part of loading

OBO-Edit, AmiGO and GO database updates (Chris - need slides)

  • Emily: Q for O files – cover 52 species, AmiGO only displays IEAs for the MOD species – any intent to expand the display of IEAs in AmiGO?
  • Chris: would be worth investigating loading of these addition IEAs
    • Action item: Ben and Chris to investigate loading of additional IEAs (from the additional 52 species) into AmiGO
  • Seth and Chris: AmiGO demo.
  • Highlight of concern regarding new hierarchy viewer – where the relationships for a term up the graph is being displayed.
  • Chris: we want something to complement graph view
  • Seth: there have been diverse opinions on this display
  • David: this is mainly due to the confusion with the similar tree view display
  • Karen C/David : QuickGO table view provides an additional display of the relationships between terms; this would clarify the view.
  • Doug: something more dynamic in the display might help – use a mouse-over which could draw lines between relationships
  • Paul: could the orientation of the display be changed with respect to the central term being displayed. Perhaps the term should be located at the top of the graph?
    • Other AmiGO discussions:
  • Seth: display of Panther families hierarchy
  • Seth: New Search features of AmiGO – quicker queries of GO terms/gene products. Wildcard queries are supported. Includes Boolean queries and dynamic filtering.
  • Tanya: could you query for all kinases expressed in chloroplast?
  • Seth: not at the moment. This requires an association search.
  • Chris: Cross-product term request. Could automatically generate an ID. A curator would need to sign off on the term, however would mean the curator would be able to use the term immediately.
  • Jane: could we do a trial of this.
  • Chris: could carry out a trial for regulation terms. Wouldn’t be hard to make a beta version live
  • Midori: could add check boxes for positive/negative regulation.
  • Chris: the term’s definition could be automatically generated.
    • Action: Limited trial of automatic term generation for AmiGO (derived from cross-products) in response to curator requests. Seth'
  • Jane: would also be helpful when the chemical ontology becomes ready for GO - would be easy to generate chemically-focused GO terms, e.g. biosynthesis terms using ChEBI ids.
  • Chris: the tool would check whether term already exists – esp if the cross-products are defined within the term.

Evidence Codes

  1. to what purpose evidence codes?
    • needs of computational biologists to distinguish predicted vs. experimental annotations
    • why to curators want to track NOT, ISS and RCA variants?
    • why would we compute on evidence codes? why would others?
    • What is role of ECO vs OBI?
    • new evidence codes, particularly to support NOT annotations in tree
      • "ISR" -> "Inferred from specific residues“ presence/absence of domains, active site etc
      • "ISD" -> "Inferred from sequence divergence“ presence/absence of sequences have rapidly diverged (e.g., paralogs)
      • "IDS" -> "Inferred from descendant sequence(s)“ (i.e. annotation of ancestors based on extant sequences) presence/absence function is annotated due to existing descendant protein that have been previously annotated
      • "IAS" -> "Inferred from ancestral sequence“ (i.e. annotation of descendants based on ancestral annotation) presence/absence function is annotated due to ancestral protein that have been previously annotated
    • See Schnoes et al --JimHu 22:27, 30 March 2010 (UTC)
    • Disentangling evidence codes (Kara/Chris). Annotation_method (Kara) Extending_evidence_codes (Chris)
    • Position of ISM
    • RCA usage and documentation
    • Notice of ISMB workshop on Assays, ECO and OBI [Judy]
      • Judy: ISMB proposal has been accepted for a workshop on assays, evidence codes – will occur in July. Michele at U of Maryland is overviewing ECO and Marcus (30% FTE) will be working on improving ECO.
    • Judy: should focus on what we want to accomplish with evidence codes. From talking with computational biologists we need to make sure that users of GO who want to know evidence codes/experimental evidence. Need to decide what is internally required for data tracking and what is necessary for external users.
      • Richard S – has an interest in intersecting with the OBI effort. Therefore at a high level do we need to separate experimental vs predicted – and possibly use OBI ids as well/instead of certain more detailed GO terms??
  2. Chris – presented slides on evidence codes
    • Evidence code issues haunt GO curation discussions. We are held back by the history of evidence codes. There are mixing axes of evidence codes which are confusing and limiting
    • Chris: would be important to carefully define the word ‘reviewed’ – would this mean that a curator has individually reviewed annotations; or for HTP would this be reviewing the experimental method applied?
    • Judy: nice to have an initial separation between experimental and computation. Although would query the use of the term ‘computational’?
    • Judy: NCBI provides evidence codes lists as well– would need to work with such other resources as well.
    • Chris: we can make ECO highly fine-grained, but have concerns from curators regarding over-expansion of terms – but curators equally could decide to select a subset of codes higher up the ECO ontology. It should become more simple to request a new evidence code needed for annotation.
    • Michele: this new higher level of organisation of ECO would fit nicely with different requests from users for ECO terms.
    • Chris: curators should become familiar with ECO and ensure it makes sense to them
      • Action: ECO and GO evidence code documentation to be inline. Michele and Evidence Code working group to investigate revamp of evidence codes
    • Judy/Chris: GO should use ECO and cross-reference out to OBI.
    • Judy: Is a distinction between reviewed/non-reviewed something we would like to have?
    • Harold: when would you use a non-reviewed IDA? Answer: from high-throughput data. It is still importat to differentiate the assay used
    • Karen: HTP GTP-tagged subcellular location studies that are published and peer-reviewed, but where a curator has not individually reviewed the data would be an IDA-NR
    • Donghui: need to focus on the data consumer. They need to understand how to use the data. We need to separate this need from what is internally required – for quality control
    • Michele: need to highlight that some trusted computational data as being high-quality
    • Pascale: how to define HTP? And need to define quality/confidence in the data. Do you really want to have any data you know if highly questionable in your dataset? This is the case for individual experiments – curators shouldn’t add annotation when you’re unhappy with the quality of the presented results.
    • Mike: if reviewed – you wouldn’t add in. If HTP and not reviewed – then users could understand why added.
    • Stan: useful to break down IDA into certain experimental methods (e.g. Western blot, immunoprecipitation etc) and have an expansion of IDA.
    • Michael: this is what ECO was designed to do. However if we have 55 evidence codes, then the job of the curator becomes more difficult. Some curators do use more descriptive codes (e.g. TAIR). We have had this discussion many times
    • Karen: there are clear problems with current codes – but moving to ECO would resolve issues we have. There will be the option of using top-level evidence codes system. Reviewed computational analysis: was intended to mean that the method was reviewed – NOT the individual annotations.
    • David: Given the number of annotators and what our users are interested in. Is it worth taking up annotators time to work through all the codes. What work is more beneficial to users?
    • Judy: evidence codes are useful. Keeping the evidence codes applied in GO annotations shallow – and perhaps sub-division of experimental vs computation is the most important distinction.
    • Chris: should start collecting the use cases for evidence codes.
    • Pascale: would be helpful to draw line at experimental vs prediction.
    • Eva: we are using evidence codes as a proxy for the reliability, w.g. expt annotations are highly reliable – but ISS annotations can also be very reliable. Need to have the reviewed statement, to make this quality statement
    • Karen: to be able to distinguish differences in IEAs would be very useful. IEAs includes a large variety of methods
    • Judy: but references currently provide the details of the different methods of IEAs
    • Michele: agree with Karen. Need to distinguish between high-and low-quality IEA methods.
    • Chris: summarize – We need to have better ways of slicing up annotation data. The issue of reviewed/unreviewed keeps being brought up. Shouldn’t be too worried as to what users currently do; should enable further uses for more educated users in the future.
    • Kimberley: has anyone done a comparison of included/removed IEAs for an analysis?
    • Kara and Val: aware of groups who have done this study.
    • Val: There are two different use cases for predicted/reviewed. For a lab-based user, it useful as improves coverage – but found they need to remove RCAs as these often have high levels of false positives. The amount of annotations with different evidence codes depends on the organism in question. The problem for enrichment studies is larger when more prediction sets are included. IEAs are not causing so much as a problem as RCAs. In addition in the RCA set there are a range of different annotations, could only include when there is an experimental evidenced annotation is not also included? Should we look at filtering of RCA annotations?
    • Karen: removal of IEAs is only appropriate for certain types of analysis.
    • Val: while IEAs are often recommended to be removed, there is no similar caution with regards RCA. IEA information is of value.
    • Chris: need to focus efforts on the ontology, and many of these issues will fall into place
      • Action item: Go forward with using ECO as the source for evidence codes and cross-reference to OBI and clarify distinction between experimental and predicted and to include user guidance regarding reviewed vs. Nonreviewed datasets. Michele and Evidence Code working group

Tuesday, March 30 Afternoon [Break – Dinner]

Impact of recent ontology changes from an annotator/ontology developer perspective [David]

  1. MF-BP links, what they are and how they impact annotation ( slides)
    • The inability to use the links between function and process is not just limited to the MODs - many tools use the assumption that the 3 ontologies are disjoint e.g. GOtermfinder
    • As an annotator, how do we know whether we should make the redundant annotation or not (i.e. how do I know whether the link is already in GO?) Check GO_ext using OBO-Edit? AmiGO needs to display these links.
    • How does this affect e.g. enrichment tools - should there be a standard way of doing this across the consortium?
    • Annotating to process-specific functions: how does an annotator know when to use a process-specific function? How much can you *know* a function is involved in a specific process? The experiments usually prove functions and processes separately but the annotator will often know that the function is occurring during the process.
    • How is this handled in the meantime? Annotate conservatively, to generic function term. This loses information. Need separate ways to capture experimental evidence v/s knowledge-derived inference.
  2. Scope of regulation relationship ( slides)
    • Take-home message - is you know the functions involved in a process, and you know your gp regulates one of those functions, annotate to regulation. If the function is one of the constituent functions of that process, annotate to the process itself.
    • If you don't know whether the gp is involved in the regulation of a process or the process itself, annotate to the narrower term which is the process. #**But the regulates relationship isn't parent child, so is the process really narrower?
      • No, but traditionally it has been, as used to be part_of. Maybe discuss at annotation camp.

Action: Discuss use of regulation terms in annotation in the context of the GO camp planned for June. David, Jane and Regulation working group

    • With an allosteric eznyme, would you annotate to the pathway AND regulation? Yes
  1. Scope of has_part relationship

Annotation 1: Binding (Rama, Emily)

  1. Guidelines for 'binding' terms'
    • proposed guidelines
    • Proposed rules:
      • Don't annotate to 'protein binding' without something in the WITH column - AGREED
      • Always make reciprocal binding annotations i.e. if protein A binds protein B, protein B binds protein A - AGREED
      • Never use of NOT qualifier with 'protein binding' - AGREED
      • Annotations to 'protein binding' should not be used with ISS evidence code - AGREED (but need to move ahead with column 16 to capture targets)
      • Curators should aim to avoid redundant 'binding' annotations for substrate/products e.g. do not annotate ATP binding with ATPase. AGREED

Action item: Make a document with the above agreed rules. Binding working group

    • Action item: Remove the comment about remembering to annotate ATP binding when using ATPase from term comments. Amelia
    • Action item: Look into using column 16 to capture targets in a more consistent manner as well as cases where you want to say a gp NOT binds a specific protein. Column 16 working group
    • Action item: to investigate managing the automatic implementation of the protein binding rules centrally. Chris, Ben, Emily, Rama

Wednesday, March 31 Morning – Break

Action Items Review (Judy)

Action items from previous meeting

from last GOC meeting

  • Primary stake holders in GO and PRO need to agree on how GO components and PRO will relate to each other
    • Harold will talk about this today and GO and PRO are working together
  • Annotation groups need to get ISS annotations from PAINT into their databases this year.
    • On target
  • Participating groups need to provide a file of their comprehensively annotated genes
    • In progress
  • MODs-make an effort to highlight Ref Genome annotation
    • in progress
  • Val will make available the form used to elicit responses from authors
    • Done!
  • Build a survey for grant update (Val, Jane, Jen), determine where to send it. (Mike has subscription to SurveyMonkey)
    • Not done - proposal supplied but not yet acted upon
  • provide two survey urls and compare responses from a targeted list like the submitters to GO help list with responses from a random list of biologists
    • Do before renewal -- NOT DONE!
  • Locate code for making font size changes in PAINT(Paul)
    • Done.
  • OBO format 1.3 tags (creation date, etc.) will be added to gene ontology ext
    • Done.
  • OBO version number added to all OBO files
    • Done.
  • GAF files from PAINT will need to be GAF 2.0 format
    • Done.
  • Switch to GAF 2.0 in publishing pipeline with substantially long (3 Months) public notice to this change.
    • In transition
  • add notices like this change to GAF 2.0 to GO News Feeds
    • Done
  • post a clear spec of the desired format for gp2protein files for all MODs-include 1 row for every gene? 1 protein ID per row? No protein ID is OK if no protein is available?
    • Superseded by below
  • keep GAF file as is. Provide a new file (gpfile?) to describe gene products. Provide a detailed spec of the contents of this new file. This file may subsume gp2protein
    • To be discussed.
  • MODS please look at the F->P IC annotations proposed for your species in http://geneontology.org/scratch/gaf-inference. If you have issues get back to Chris in the next 2 weeks.

Action: All curation groups to look at the F->P IC annotations proposed for your species in http://geneontology.org/scratch/gaf-inference. Investigate loading inferred annotation into association file. Any issues to be sent to Chris by 3rd May

  • MODS will load these F->P IC annotations to their database
    • As above.
  • Taxon constraint checks, report should be sent back to group but not filtered out..still load.
    • Happening.
  • chris check that interontology inferred annotations limited to inference from experimental annotations only.
    • Er. Yup.
  • use a more relaxed schedule for building GO Full (quarterly perhaps?)
    • Nah, we're fine.
  • reduce load of go lite from 3 to 1 time per week.
    • Done!
  • implement a version like shown for curators to try before next GO meeting
    • Done!
  • Group agreed this makes some sense. Work up a more concrete specification of how this would work.
    • Er...
  • Pascale, Anne, Harold, Chris, Peter D’Eustachio, and Mary Dolan will work together to provide cross products b/t MFunction and Chebi to provide and report back to the group in the Spring meeting
  • more work needed to iron out HOW to transfer binding term annotations properly to retain all information
    • Will come today.
  • actin-dependent ATPase activity and tubulin-dependent GTPase activity terms are incorrect
    • Done. Rex and Pascale are still concerned about tubulin (Rex is more concerned).
  • Increase effort to get GO and ChEBI aligned
    • Today.
  • Have another annotation camp, perhaps just for current GO curators
    • In progress.
  • Current working group will edit the binding term annotation guidelines in line with the comments from discussion and distribute to PI’s for approval and then include in guidelines.
    • In progress.
  • paul and Emily interested in joining new working group discussion on ISS/IC topic
    • Not an action item

Annotation 2 (Rama, Emily)

Binding Discussion Cont.

Action item: ontology group to work with binding WG to make binding classes e.g. binding as co-factor etc.

  • Binding as part of e.g. catalysis is trivial and should be inferred automatically through the ontology. But we also want to capture specific binding events where they are physiologically important.
  • [Michael] Three classes of protein: first, those whose only function is to bind small molecules to e.g. to sequester (only binds). A second class who bind their substrate and catalyse its conversion to some other substance e.g. alcohol dehydrogenases (only catalyses). A final class of proteins is exemplified by GTPases, where the binding to the small molecule is primary, and the catalysis is secondary (multifunctional). In the first and third classes a binding annotation should be made.
  1. Discussion of guideline documentation, progress and unresolved issues (Emily and Ruth; 12 slides).
    • Should we include transporters in these guidelines?
    • Should we capture drug information?
      • Drug binding currently only has a small number of child terms - this was a deliberate decision because drugs are only drugs in a certain species.
      • Should GO decide what or what isn't a drug? External users will know which compounds are drugs.
      • Perhaps capture the fact that the compound is acting as a drug in the annotation (column 16 - GO term)
      • [Michelle]: wants to keep drug binding, toxin binding as 'stub' terms. To be replaced by an annotation in column 16 (use CHEBI role 'drug')
      • [Chris]: use CHEBI to decide what compounds are drugs. Anything that CHEBI classifies as a drug we would classify as a drug.
      • [Michelle & David]: but drug role is organism-specific - a drug is a drug only to a subset of species. Calling an antibiotic a drug for a bacterium is incorrect.
      • The same rules will apply to all terms that include 'drug'.
      • [Peter]: we don't need to actually instantiate the drug role in GO - we could use CHEBI to decide what or what isn't a drug.
      • [Michelle]: CHEBI don't provide any information as to the species range in which the compound acts as a drug - we need to do this at the level of the annotation
      • [Ben]: shouldn't this be captured as a process? Drug is a poorly-specified term
      • [Michelle]: Useful to be able to collect together processes by which organisms respond to external compounds
      • Action item: Continue discussing the issue of 'drug' between now and the GO camp. Binding working group
  2. IMEx Consortium: carry out high-quality annotation of protein-protein interactions. Should we duplicate their work? They will accept requests for specific proteins. We could request that they annotate a specific protein and then import their annotation, maybe filtered by experimental method (e.g. no yeast2hybrid).
    • General agreement that this is a good idea. Should we import the data? [Chris] we should divide the labour but then import the annotation into GO.
      • Action item: Emily, Chris - work with IMEx to figure out how their annotations might be imported into GO.

Discussion of taxon IDs and subspecies [Jim Hu]

  1. What do we mean when we assign a taxon id to an annotation?
    • For E.coli, the complement of genes in any strain is a small fraction in the pan-genome. 18-20,000 genes in the pan-genome, however the core E.coli genome is 2,000 genes.
    • Therefore when annotate we could use the E.coli taxon id or use one of the subnodes for E.coli can go down to strain level.
    • Currently multiple taxon ids in the file, and uncertain as to how users are applying taxon ids. This could be confusing for users.
    • Judy: in mouse there are many strains, but only use one taxon id in MGI, although we are What happens in AmiGO?
    • Mary: perhaps should concentrate on annotating a central protein set, unless there is a gene which has a particular function?
    • Judy: should use taxon level use the lab level should work with NCBI for representation.
    • Alex: it is useful to collect specific strains as does convey information that different strains will have different effects on the host. It is useful information.
    • Jim: often authors do not provide good strain information.
    • Paul: what is the taxon id meant to signify? Are we signifying what we’re annotating/what the information is coming from??
    • Judy: a gene product id will provide information on strain or specific source of the protein.
    • Jim: multiple sources of strains the produce the same product.
    • UniProt is using K12 taxon id
    • Judy: important issue – general theme of using high level taxon id, therefore might need to work between GO, E.coli and UniProt to settle on an agreed UniProtKB identifier.
    • Suzi:how feasible to move from lower level to more general taxon id.
    • Serenella: UniProtKB is unlikely to move to using a more general taxon id
    • Michael: at curatorial level you need to capture data at the most granular level. And then how the data is presented through AmiGO could merge up to 562 E.coli species – the meta-organism
  2. Summary – the higher level taxon id could be used to summarize annotation level – but curation should capture the specifics.

Requirements and QC for contributed annotations [Rama]

    • M. tuberculosis example (see slides)
      • Rama: in contact with an Indian group have carried out a jamboree for M.tuberculosis – group has carried out extensive training, however they had not got in contact with the GOC before starting the curation work
      • What requirements are enforced on external groups that are willing to provide us with GAF files?
      • Therefore some suggestions as to what GOC could do to support and advise such groups.
      • Pascale: could additionally encourage UniProtKB accessions used by communities
      • Rama: important that a curator is put in touch with external groups. And before they submit annotations, need to check how they are intending to maintain files
      • Jim: E.coli do not submit a file from GONuts currently – we should be doing this in future
      • Emily/Jim: Is there any QC SOP for these? Jim: Currently the low level of annotations coming from the community means that E.coli group can individually check these annotation – in future it might not be so helpful if community contributions greatly increase.
      • Tanya: TAIR has had a collaboration with a journal – to take in annotations from community annotations – we are trusting that he authors are carrying out the appropriate annotation.
      • Pascale: concern regarding staleness of annotations – not such an issue with manual annotations
      • Emily: these annotations will not *automatically* go into GOA
      • Jane: if the annotations go into TAIR – then you are responsible for them. Needs a contact to fix the dataset., and links out from AmiGO
      • Eurie: could archive files which are not being maintained.
      • Pascale: if the annotation was correctly done, and a link to PubMed.
      • Suzi: this current group could be used to feedback on annotation guidelines and to gather ides to improve QC project
      • Action item: improve the annotation guidelines based on feedback and analysis of the TB project [Rama]

Plans for Geneva GO Annotation Workshop [Pascale]

If anyone has suggestions for this meeting, please add to the agenda for the monthly meetings

Wednesday, March 31 Morning [Break – Lunch]

Ontology Updates

  1. Term additions from the ASCB and SDB meeting (Tanya, slides)
    • David and Tanya attended two major biology meetings. Added approx 400 terms as a result
    • Average cost per term $13
    • Pros and cons to this strategy
  2. Additions from the kidney and heart development meeting (David, slides)
    • Cost per term for this strategy around $20?
    • How do you visualise the graph with experts? Look at flat file directly
    • Annotation done in tandem with content development for heart
  3. PAMGO/multi-organism process update (Jane, slides)
    • new terms to be added for non-symbiotic biological processes(predation/feeding/envenomation)
    • Richard S - 'injection of a substance into an organism' - might need to be differentiated clearly from the OBI term
    • Action item: clarify def of 'injection of a substance into an organism for envenomation term, to ensure that it does not conflict with the OBI term. Jane
    • additional expansion of terms under symbiosis. Difficulties in lengths of terms and problems with x-products.
    • viral terms being developed as well.
    • Viral contributions from a range of curation groups, including Michele, Fiona, UniProt curators
    • Richard S - should reach out to pathogen groups in the US. Interest in expanding host-pathogen terms. Jane has been speaking to some groups at the OBOFoundry meeting
    • Michele: cross-talk with VIB and PAMGO
  1. Generic GO slim (Jane, slides)
    • almost finished slim - 65 BP terms. has gone slowly as made a large number of ontology fixes, many high-level term rearrangements. New terms generated.
    • terms would cover all organisms - users could then expand/reduce areas depending on interest.
    • waiting for AmiGO to take slim set and run dataset to see what gp are not being annotated by the slim
    • terms are visible from the wiki
    • method for picking terms - aim for complete coverage and biological importance, and number of annotations. Therefore some categories have been included for completeness that are not highly covered
    • decision to be taken based on regulates relationship.
    • not much value in generating a MF slim
    • a cellular component slim may be helpful.
    • Chris: two versions of map2slim. AmiGO does not consider regulates relationship.
    • Jane: users should choose whether the regulates relationship is included or not.
    • Jane: paper will be written.
  2. Cross products (slides, David)
    • Do all possible combinations of GO terms get created by this process?
      • No, we only instantiate where the biologically meaningful. Plus the process of writing cross-product definitions is independent of making new terms.
  3. Aligning chemicals ( slides, David)
    • GOCHE can also be used to fill out the full term set for all GO terms which includes chemicals (i.e. transport, transporter, metabolism, catabolism and biosynthesis)

Cell Ontology (Terry)

  1. Content update (Terry, Cell Ontology [4])
  2. Improved support for OBO/OWL/Protégé intersections (Chris)
  3. OBO_EDIT updates/report (Amina)
  1. streamlining addition of compositional terms [Chris]

Ontology Content Projects

  1. Bringing the NIF ontologies into GO. Allowing other groups write access to parts of GO (Jane, slides)
    • NIF (Neuroscience Information Framework)
    • Tanya: how to ensure that the parentage is the same in NIF and GO ?
    • Jane: would need to be some verification layer to look for inconsistencies in the hierarchy to be manually checked before integration into GO. This will need to be investigated during the process
  2. GO-Pathway database integration (Chris)
    • GO should be a one-stop shop for functional/locational information for gps, so we need to import data from other sources
  3. PRO; MouseCyc and GO; Complexes in PRO and GO (Harold; 35 slides [5]).
    • Importing data from Reactome improved p-values for enrichment analysis
    • Manually importing data from other systems is lossy - can we do it automatically?
    • Yes, although there are still lots of issues that need reconciling between systems e.g. signalling
    • Can GAF + ontology capture everything we need, looking to the future?
    • Not entirely, but the simplicity is one of the things that makes GO attractive to users so we should carry on persuing this model.
  4. PRO; MouseCyc and GO; Complexes in PRO and GO (Harold; 35 slides [6]).
    • looking at ceramide biosynthetic process in GO and MouseCyc resulted in improved representation within GO. No intent to go through all MouseCyc pathways and keep in line with GO Complexes
    • PRO generating descriptions of protein complexes.
    • Emily: UniProt also working on generating identifiers for specific complexes. We'd like to annotate them directly to biological process/molecular function terms.
    • Harold: also interested in annotating PRO to BP/MF GO terms.
    • David: complexes are defined by membership and also by function.
    • Discussion: Cross-over between PRO and GO and complexes. Do protein complexes fit in well with GO?

Wednesday, March 31 Afternoon [Lunch - Break] [OBO-Edit Working Group Lunch Meeting]

GOC website redesign (Rama)

  • This does need doing, but we need to decide where it will fit into our priorities
  • Amelia is making incremental changes to the website, and will make specific changes if people ask
  • We should think of examples of what we want to do with the website for the grant renewal

GAF file

  1. GAF2 update
  2. Proposal for new gene product data file format for submitting gene/protein data independently of annotations
  3. Proposal for new annotation file format containing only annotation data (complementary to the GP data file format above)
  4. Proposal to add a new column to the GAF to indicate if an annotation was reviewed by a curator or not.
    • Tony: xrefs should go in a separate file solely for mapping
    • Pascale: some reference genome stats could also go in this file
    • Judy: will take too long for all groups to catch up with this change - let's just go ahead with this change
    • Emily: GOA will start producing this file redundantly over the next couple of months
    • Tanya: is this a GAF 3.0 file? We've only just gone to 2.0!
    • Emily: no, it's the same format, the file is just split and gp information only provided once
    • It's a non-redundant GAF file (NAF) - easy to reconstitute a GAF 2.0 from the NAF + GAF
    • Action item: Amelia to write up documentation for split GAFs
    • Action item: We will roll out new split GAF for all groups

Wednesday, March 31 Afternoon [Break - end of day]

Renewal, SAB and AOB

  1. Parking lot item discussions or catch-up reports
  2. Strawman proposal for upcoming competitive renewal (Suzi, Chris, & Paul)
    • What is the GO's core role? Should we stick to our core annotation principle or innovate?
    • Ruth: likes the idea of system biology annotation with managers. Same genes act in many different systems
    • Judy: How do we measure what we've done, what we're asking for in renewal. Learn from what we've done so far.
    • Richard: need to couch our approach so it appears novel
    • PaulS: GO is central to our nematode projects. Need to be able to leverage annotation across species - currently lacking orthology information. GO should provide this role - one-stop shop. Improve automation.
    • Ruth: users want one tool rather than information on many.
    • Visualisation - important for all tools. How to display, explore and manipulate large datasets.
    • Pascale: annotation is capture of novel data
    • Richard: 1000 genomes studies intra-species variation - could annotate variants as a novel project.
    • Paul: regular GO annotation of gene products important for study of variants - GO killer app
    • Emily: improving electronic annotation techniques should be a priority
    • Jane: electronic annotation also critical for proks/viruses/parasite species which won't get manual annotation
    • Cytoscape: GO visualization most popular app
    • Mike: we are an infrastructure project, we don't need novelty in our renewal
    • Suzi: GO proposal getting big - can we carve out smaller sub-projects on separate grants. Don't want GO proposal to encompass everything
    • Rex: Revisit who our core audiences are - bench biologists, computational biologists?
    • PaulS: GO buried in other software apps
    • Mike: GOTermFinder - incredibly popular. We don't need to write all applications ourselves, GO will be sucked into other apps
    • Judy: we are an expensive project - need to keep meeting our community's needs. We have to stay relevant.
    • PaulS: need to be able to estimate what proportion 'complete' we are, what needs to happen before this is just maintenance.
    • Tanya/David: we should aim to be able to facilitate hypothesis generation, for labs or just for ourselves. Need to be careful adding that to the grant if we're asking to be funded as a resource.
    • Jim: community annotation via wiki - annotation camps for grad students where they do annotation
  3. Preparation for SAB next day

Summary of Action Items

  1. Action: user documentation needs to be made, indicating how GO annotations can be filtered by users (e.g. alerting users to different annotation subsets IEAs/RCAs) [Working group to be established; including Harold and Amelia]. Manager: Rama and Emily
  2. Action: Reference Genome group needs to explore ways of improving the target list selection method to ensure a greater participation by all curation groups [Pascale and Kara].
  3. Action: All annotation groups to review PAINT annotation sets with a view to integrating them into their association files [all groups]. Manager: Pascale
  4. Action: Figure out a mechanism by which taxonomic restrictions derived from PAINT could be fed back into the ontology taxon restrictions file [Paul and Jane]. Manager: Jane.
  5. Action: Ben and Chris to investigate loading of additional IEAs (from the additional 52 species) into AmiGO. Manager: Chris
  6. Action: Limited trial of automatic term generation for AmiGO (derived from cross-products) in response to curator requests [Seth]. Manager: Chris, Jane, David.
  7. Action: ECO and GO evidence code documentation to be inline. Michelle and Evidence Code working group to investigate revamp of evidence codes. Manager: Emily and Rama
  8. Action: Go forward with using ECO as the source for evidence codes and cross-reference to OBI and clarify distinction between experimental and predicted and to include user guidance regarding reviewed vs. Nonreviewed datasets. [Michelle and Evidence Code working group] Manager: Emily and Rama
  9. Action: Discuss use of regulation terms in annotation in the context of the GO camp planned for June [David, Jane and Regulation working group]. Manager: David and Jane.
  10. Action: Make a document with the agreed rules on limiting usage of GO term ‘protein binding; GO:0005515. [Binding working group] Manager: Emily
  11. Action: Remove the comment about remembering to annotate ATP binding when using ATPase from term comments. [Amelia] Manager: Jane
  12. Action: Look into using column 16 to capture targets in a more consistent manner as well as cases where you want to say a gp NOT binds a specific protein. [Column 16 working group]. Manager: Rama and Emily
  13. Action item: to investigate managing the automatic implementation of the protein binding rules centrally [Chris, Ben, Emily, Rama].
  14. Action: All curation groups to look at the F->P annotations proposed for your species in http://geneontology.org/scratch/gaf-inference. Investigate loading inferred annotation into association file. Any issues to be sent to Chris by 3rd May. Managers: Chris, David.
  15. Action: Continue discussing the issue of 'drug' between now and the GO camp [Binding working group]. Manager: Emily
  16. Action: Work with IMEx to figure out how their annotations might be imported into GO [Emily, Chris].
  17. Action: improve the annotation guidelines based on feedback and analysis of the TB project [Rama].
  18. Action: clarify def of 'injection of a substance into an organism for envenomation term, to ensure that it does not conflict with the OBI term [Jane].