SLC GO Consortium Meeting Minutes April 2008

From GO Wiki
Revision as of 21:26, 22 April 2008 by Fmccarthy (talk | contribs) (2. GO Reference Genome Annotation Team Report (Pascal))

Jump to: navigation, search


ACTION ITEM: GO Top needs to sign off on agenda before the GOC meeting & Ref Genome meetings prior to these meetings. If there is no action item stated, this is not the forum for the item.

1. Introduction & Review of Agenda

Judy – Intro

New people introduced – Siddhartha, Naomi, 2 from cvji

Welcome back Jane

2. GO Reference Genome Annotation Team Report (Pascal)

Objectives of the Project Goal- comprehensive annotation of 12 genomes


394 genes in target list – from disease genes, hot genes, metabolic pathways, unannotated gene lists.

Ref Genome Process:

1. generate focal sets This has been a difficult process since some of our genomes are not included in the common ortholog resources a. - P-POD (Kara Dolinski, Princeton) b. – Tree-based ,etjpd (Paul Thomas, SRI)

2. select common curation targets

New curation priority is to annotate the families generated by PPOD where there is at least 1 member from each Ref Gen species – there are 153 of these sets.

Rex: focus on the clusters that have 1 in each species (rather than multiples) as these will be the least confusing.

3. do experimental annotations – notification once completed

4. do inferential (ISS) annotations

=> at each step generate error reports

QC issues – need QC at each step in the process. For GO annotation, look at outliers, check co-occurrences of annotations. For orthologs, manual verification of ortho sets.

b. Methods i. Number of genes with ‘no data’ with data in other organisms ii. Graphical view displays ‘outliers’ iii. Looking for co-occurrences of annotations as a high-level (confusion matrices) iv. Verification of ortho sets

Rex: will monitor to see if we are making improvements each time we do manual QC.

Proposed Changes to the GA File:

a. Column 2: longest version of the gene/protein (gene preferable, protein acceptable; other objects such as complexes are also accepted

b. New Column (17) ID of the object being annotated. Will be different from column 2 in the case of isoforms

c. Column 12 would refer to column 17

d. Header of the GAF will have a standard sentence that says how many gene products are annotated and the expected total from that organism

=> Chris: more information about this is now on the wiki.

Rex: key point is that this allows us to annotate isoforms and connect those isoforms to a gene.

Judy: this change will feed into generating the gene index file

Proposed changes to the gp2protein file:

1. should only contain one version of each gene sequence

2. ---? Pascal?

=> more information about this on the ref genome wiki.

Software development discussed during Ref Genome meeting:

1. Siddhartha, Chris, Seth, Mary – database and tool where target genes and orthosets and their curation status will be maintained

2. Will replace Google spreadsheet

3. Graphical displays- several improvements

4. Integration of ref genomes genes into AmiGO

3. Ontology Development

3.1. Report on accomplishments (Midori)

1. Sensu – not used any more in GO terms

2. Disjoint violations corrected

3. Cardio & muscle content meeting work – this work has gone live

4. Electron transport terms reworked

5. SourceForge statistics 500 opened; 478 closed!

Have about 200 items constantly open. At the end of this meeting will close all open source forge items that are more than a year old.

Sue: request that emails be sent out warning you that the items are going to be closed – prompt submitters to reread their items.

6. Regulates relationships – gone live

3.2. Implementing the new regulates relationships (Tanya)

If regulation of X was part_of X then all that need to be changed was to convert to regulates. BP terms reviewed, some added as part_of relations (many new definitions and synonyms also added). Reasoner – used to identify inconsistencies between regulate terms and BP terms, missing terms. QC reports generated.

QC reports: - missing links between terms

- internal consistency within the DAG

- relationship problems (regulate terms that are is_a instead of part_of)

- multiple part_of parentage (if you regulate a process, do you regulate its parent? Doesn’t work in terms with multiple part_of relationships.)

March 25: regulates relationships go live!


1. BP ontology improved – new terms, definitions

2. New relationships portray the biology correctly

3. New logical definitions allow automated creation of error reports as ontology is changed.


1. intersection tags: cross products between the GO ontologies

2. intra- and inter-ontology links for regulates relationships: occurrents can regulate occurrents

MF can regulate MF, BP can regulate BP BUT ALSO MF can regulate BP (an vice versa) Will change MF from being a strict is_a ontology

These are ready to go now but will put out notifications first.

MF that always occurs in a single type of BP are part_of the BP. Reasoning: that BP are a collection of MFs, therefore the relationship should be part_of

=> would appreciate feedback about what people think about this

Comment: when to annotate to regulate term?

- need to decide based on author’s argument

- annotation to regulation of BP will also still be annotation to BP parent term

3. BP ontology improvements:

Metabolism – intermediate regulation parents filled in manually in one section; moving towards computational analysis to find these terms for larger areas of the ontology followed by biocurator review before adding new terms.

Signal transduction – can say that signal transduction is_a regulation of cell communication. Will be starting a push to revamp signal transduction: need definitions with a start point and an end point. Beginning of signal transduction is ligand-receptor binding then the rest is a cellular process.

Finding general BP that signal transduction can fit into as a regulation. EG: biological objective of BMP signaling pathways is to regulate transcription. Where it is now, isn’t connected to transcription

4. QC: continue reviewing QC reports that will be run regularly

3.3. Function -Process Links (Harold)

1. Considerations on Glycolysis and TCA Cycle

First thing is to define start and end points for glycolysis. Made stop point at pyruvate so that didn’t have to consider aerobic/anaerobic processes - same process of defining start and end points for the Kreb cycle.

Examine terms & definitions within the defined start & end of the process - find discrepancies, possibly missing terms. Looking at definitions and refining/adding. Looking at the comparable Reactome data.

Peter: reaction can be slightly different depending on the outcome. Should we look at the pathway as the common element for all processes or should we look at these as separate processes that have separate purposes (with overlap). The latter will become enormously complex as we start considering diverse species.

David & Midori: Historically, BPs are defined by their objective, therefore it makes sense to take the latter view. Will require parent terms to collect these processes. IF different gene products are involved, it should be defined as a different process.

3.4. Electron Transport Cross-products (Jen)

Looking at electron transport region and representing BP & MF cross-products.

Developed lists of MFs, their BP and a taxonomy group that it applies to

Used this to make has_part relationships: BP can’t exist without MF in this taxon

Put has_part relationship to put BP term under MF term (at the moment there are difficulties visualizing this).

Taxon issue – how do we represent that these relationships should be qualified by taxon - possible to have general parent term that has several children term to represent the different taxons.

In order to create links from BP and MF are going to need the new relationship, has_part. At the moment, need to come up with a way of adding this relationship in to the graphs (violates expectations – makes syntactical sense but is harder to understand for biologists).

Judy: Should more of these projects be undertaken at this point?

Jen, David, Harold: These projects should continue, improve ontology.

Need feedback from users – Chris can make a separate version of AmiGO that includes these relationships.

Michael: need careful documentation for users.

David: needs to be presented at conferences to get it out there for comment, get users used to it.

ACTION ITEM: Jen and Harold to continue with this process, making improvements to ontology as they do.

ACTION ITEM: Everyone else to look at these relationships & giving Chris feedback. Files available at OBO Edit scratch page.

3.5. Cross Products between GO BP & CC (Chris) OBO 2 has ways of looking at cross products – intersection editor; files for CC-BP are available in the scratch file.

To capture what is happening with the BP, need a number of new relationships.

Found inconsistencies between ontologies - Idea is to use the intersection editor to keep track of consistency issues between BP & CC

Cross Product timeline:

1. Move cross-product files into OBO edit.

2. Then move cross-products into main ontology file – dependent upon new release of OBO edit.

3. Integration of Cell Line (CL) cross-products into edit cycle – no plan at this time to integrate into the main ontology file.

Working on how to display cross-products in AmiGO. Example: AmiGO with GO & Mouse anatomy; heart development.

ACTION ITEM: Need feedback from users on AmiGO display.

4. Outstanding Issues, session 1 (Midori)

Proteases – is there any objections to a reorganization of MF relating to protease activity? Proposal on wiki for review.

NOTE: If we do it, many protease terms will become obsolete on the basis that they are gene products rather than functions.

Peter: Proteases can be distinguished generically by structure (1) active sites (4 different types) and (2) by where in the target peptide they cut (3 types). This would give us 12 MF terms. Would this cover all proteases?

Need cases from the group to see if lumping the annotations into one of these 12 terms would lose information.

Judy: when do MF terms stop representing classes and start representing individual genes? Specific gene products should not be represented.

Midori: there are some MF terms that have clear differentiate from other MF terms. These terms should be used.

ACTION ITEM: continue to investigate the protease terms to remove concerns about terms that reflect activity of single gene products.

5. Advocacy/Outreach/Collaborations (Jen/Jane)

5.1. Help Desk (stats report) and newsletter

Most queries to GO help desk answered within 24h.

Newsletter released quarterly.

Web presence working group (formerly AmiGO WG) – advocacy group will determine features then the AmiGO WG to implement.

AmiGO future features:

Ref genome display

Wiki style user annotation – GONUTS

AmiGO web services

How to prioritize development and implementation of AmiGO features under discussion.

5.2. Annotation Outreach

Not cold calling dbs any more – all have been contacted.

TAIR have made agreement with journal to accept annotation by submitting authors & discussion at PAG 2008

Reactome have developed a ga file?

Muscle annotation wiki developed

Sol Genomics ga file submitted

6. AmiGO, GO Database

6.1. Software group progress report (Seth)

GO Term enrichment

AmiGO 1.5:

– new advanced search page & improved search functions

– search function is extend if no result found; may need to add a limit to this search function

- results sorted by relevance

- pie charts have been replaced with bar graphs

- regulates relationship icons have been added

- now have GOOSE links back into AmiGO

Next release of AmiGO (1.6):

- ref genome support is main focus

- display Mary’s graphs with pan & zoom function, more interactive (highlight direct vs indirect, species, ISS only, etc

- display of homolosets on AmiGO

- display of intersections between GO terms (bi-axial viewer)

- links to and from GONUTS to make it easier for people to leave comments & for biocurators to se these comments

6.2. Reference Genes - DB management extensions

Populating the database with NCBI taxonomy trees – use to filter queries

Added support for curation of ref genome curation sets

ACTION ITEM: Ben & Mike – get isoforms into GO database

Annotation Cross Products:

wiki page now for cross product annotation will have case studies added for next GO meeting. Using column 16 to capture cross-product information?

Question: will we need another qualifier to capture this refining information or do we use a separate column (column 16)?

Column 16 is used for refining the information that is captured eg. particular CL for a BP The relationship is between whatever the GO terms is and the refining object.

7. SO (Karen E)

Since Princeton have been working on SO tracker items – 16 items closed.

Working to sort out sequence attributes of SO: looking at annotator consistency of SO attributes. Go through list of sequence attributes and divide into BFO classes: quality, disposition, function, role (from Barry’s definitions). This is done individually, with comments on why the decision was made. Then results pooled and analyzed statistically to determine biocurator agreement. Iterative process.

Doing this to make SO attribute classes that reflect BFO.

8. Progress with OBO-Edit (Nomi)

OBO-Edit 2 improvements:

Documentation and installer

I/O & command-line options


GUI look and feel

Priorities for release of OE2:

memory/speed issues

restore all functionality that was present ni OE1

bug fixes in new components eg. graph editor


new features requests

Michael Schroeder's plug-in – link GOPubmed to OBO Editor?

9. Overall Project Management

9.1 Structure of Management Groups

What is working (and What is not)?

Wiki is too Kafka-esque: many pages are not linked properly and are hard to find. Could be using wiki categories more.

Too much work time taken up in meetings, not enough time to biocurate. Has the project reached a level of complexity where it is not possible to know everything going on? Should work be summarized so that people can be updated more efficiently? The opportune time to be updated is to review the reports posted by other groups; progress reports are now by WG rather then by MOD.

GO Top needs to review draft GOC and Ref Genome agendas as a group.

Judging by progress, tings are working pretty well but need to balance overhead by biocurators.

GO Manager calls: not enough time to resolve issues brought up. On some calls, one issue has been discussed for most of the call and haven’t moved on to other calls. Is it necessary to get the update from each group every 2 weeks? Calls should de-emphasize progress reports, but focus on the things stopping progress.

9.2 Identify communication bottlenecks and time-sinks, knowing who is responsible and time estimates

Does everyone know who to contact to make progress on a road block? Yes, know who to go to but not sure if there will be a response.

Could we collect Skype addresses & have them available (but hidden form the public)? Would be good to solve problems in real time.

Wiki, Webex & Skype are helping solve problems in real time. This is really helping

One roadblock is to post a source forge item and then not hear back from anyone. Biggest problem with closing items is unable to make progress because of lack of information. Suggestion is to remove the SF item if there is no response after a designated time.

10. Outstanding issues (continued)

10.1 Discuss Establishment and Maintenance Terms

'ACTION ITEM': change to maintenance of localization

10.2 Discuss IMP and the with column.

Guidelines for what goes there - phenotype ID, genotype ID

ACTION ITEM: Chris will be talking to individual groups with how they use the with column for IMP. Each MOD groups needs to respond to this for Chris.

10.3 Discuss how we are going to handle 'response to drug'

SF 1242405 and 1494526 and 'response to toxin' SF1658374. Are they normal biological processes? Response to X (drug, toxin) Response to Chemical (see also Use of Response To Terms in Annotation, for a related issue)(David and Tanya)

ACTION ITEM: Implement Michelle’s proposal to ???

10.4 Chemical derivatives and metabolism terms:

How should we handle derivative compounds in the metabolism parts of GO--should we continue to include metabolism of X derivatives as is_a descendants of X metabolism? It can be confusing. For example, SFs 1885151 and 1847808 note that 'gamma-aminobutyric acid metabolic process' is a descendant of 'fatty acid metabolic process' because GABA is a derivative of butyric acid. (Midori and Val)

ACTION ITEM: Midori and David will deal with looking at these terms.

11. Collaboration with other projects

Reports on new/ongoing collaborations with these groups:

PRO: Protein Ontology (Harold) - PO being made to relate proteins to splice forms, variants, etc. Should be able to attach GO information to these forms.

Reactome - GAF done and ready to be submitted

GOA will be picking this file up. Reminder to check with Mike and make sure that this data is being picked up.

Panther: Will be running Ref Genome targets through Panther.




12. Discussion Points

1. Discuss: At one of our meetings Chris suggested we might keep an archive of gene sets used in publications that were analyzed using GO. Should we do this? (David)

This seems to be a research project.

Should the AmiGO WG provide a way for people to provide datasets?

13. Review of Action Items from Princeton Meeting

Day 1

1. Tutorial on wiki discipline (assigned to Jim Hu ?).


2. (ALL) Look at and comment on outstanding items Outstanding Action Items from 17th GOC Meeting, Cambridge UK


3. Check whether there should be a relationship between pigment metabolic process and pigmentation


4. Jen: A reference to these pages should go in next newsletter.


5. Jen Add a link from outreach to something (SOP?)


6. investigate why terms requests aren’t coming in, do we need things we need to do to make it easier, SF tracker list/annotation list/ who are on these lists/ do other people need to be on those lists?


7. NML Michael sent ISSN URL to Eurie – Action Eurie!


8. e-mail Ben if you are not getting a gp2protein check for your database.


9. Somebody mentioned RSS feed, is this a potential action?

Web presence group to discuss

10. Reactome annotations should be available from GO by the next GO Consortium meeting. Chris, Alex, Jen and Ruth to be responsible. # Add new evidence code EXP for 1:1 Reactome to literature, add all other Reactome with TAS to Reactome source.


11. Convert Reactome complex terms to GO terms

Some progress: review at next meeting.

12. Jen to do a pilot project with a minimal set of terms, as an experiment and bring back results for next GO meeting


12. (David Hill) Make difficult sensu terms organism specific (biologist intuitive) (i.e plant vacuole, fungal vacuole). However GO definitions will still be designed to be formal, not depending on species to define the term.

This needs to be moved to a source forge item.

Day 2 Action Items

1. (David Midori Seth) Deploy the part that created SF items based on a friendly webform, and would like to see an OBO format in the SF item

In progress

2. Seth, ORB: Make link to how to make a perfect GO term from the term request tool

In Progress

3. Amelia link GOOSE from front page


4. DH: Cross products: need to have webex meeting to everyone understands what to do.


5. OBO file renaming. JB: add a link to Wiki: On the best practises page:


6. Web presence WG work on specification needed for new Amigo features.


7. Gene Association files: to work on a more advanced interface to download custom files (Chris)


8. Gene Association files: to filter files as they come in. (Mike)

No longer relevant?

9. Judy: Predictive Activities. Collaborations with external groups. Reports into next GOC meeting as to these kinds of activities. AND 10. Jim: Suggested Making a repository for predictions POSSIBLE ACTION ITEM?


11. Finalizing proposed evidence code documentation – abbreviated version on web pages and more detailed on GOC Wiki (Rama)

Need to make sure that the evidence code ontolopgy reflects the documentation

Also, need to make sure that there is a tracker for this.

12. Eurie: querying communities on awareness of evidence codes – do you know what it is, what do you use it for? Also proposal of expanding, then get a feel for what would benefit them? So that we have a large audience.


13. Sue, Michelle, Rama put evidence code proposal in the context of what we discussed today


14. Evidence code committee. Documentation for users and curators.

Curator - DONE User documentation - not completed

15. Evidence code Revise evidence code documentation so that a mutation in only one gene can only be IMP (protein localization IGI example)


16. (Curators) Check whether you have used IGI in this way and update annotations

All groups to check this. David is DONE.

17. (Curators) 'with' column optional for NAS - document


18. Update evidence code decision tree in response to today's discussion on evidence code usage (Jen and EV Code WG)

Pending: check with Karen?

19. (Curators) only ND allowed to root nodes - clarify this in the documentation (Rama)


20. Karen E and Chris M will work on GO-SO cross products

In progress


1. GO Top needs to sign off on agenda before the GOC meeting & Ref Genome meetings prior to these meetings. If there is no action item stated, this is not the forum for the item.

2. Jen and Harold to continue with this process, making improvements to ontology as they do. Everyone else to look at these relationships & giving Chris feedback. Files available at OBO Edit scratch page.

3. Need feedback from users on AmiGO display.

4. Continue to investigate the protease terms to remove concerns about terms that reflect activity of single gene products.

5. Ben & Mike – get isoforms into GO database

6. For Establishment and Maintenance Terms, change to maintenance of localization

7. Chris will be talking to individual groups with how they use the with column for IMP. Each MOD groups needs to respond to this for Chris.

8. Implement Michelle’s proposal to handle response to drug terms (Michelle???)

9. Chemical derivatives and metabolism terms: Midori and David will deal with looking at these terms.

10. Web presence group to discuss RSS feed

11. David to move to source forge the proposal to make difficult sensu terms organism specific (biologist intuitive) (i.e plant vacuole, fungal vacuole). However GO definitions will still be designed to be formal, not depending on species to define the term.

12. Need to make sure that the evidence code ontolopgy reflects the documentation. Also, need to make sure that there is a tracker for this.

13. All groups to check on how they use IGI and update annotations as per Princeton discussion.