Ontology content development

Overview (Midori)

Most of the report is on the wiki. A lot has been accomplished. Highlights include the following:

closed more SF items them opened since last meeting (~200)
peptidase reorg is finished: After SLC meeting, MEROPS database curators were contacted and they made recommendataions. Those recs have been acted upon and reorg is finished.

There still are ~200 open items. All those that are more than 6 months old have been assigned but maybe those should be reviewed to see if the priority should be changed. Also, David mentioned that many of the items are being taken care of the in large chunks with ontology changes, such as "biogenesis and organization" terms. Some are stuck because no consensus can be reached.

The majority of the ontology content section will talk about future work and the links that will be made between function and process ontologies.

ACTION ITEMS

SF items that cannot be closed due to lack of consensus should be put onto a wiki page so they can be resolved at an upcoming GOC meeting. Midori said to email the editors and they'll take care of it.

Theory and examples of function and process links (Harold, Jen)

(get his slides)

Many groups have been working on systems for links trying to see how it works since it does represent biology. Harold showed examples of cross-products using biochemical pathways, defining a start and end, selecting paths, and using common resources. Done manually, the links looked OK but labor intensive. Could it be done automatically?

Problems with doing it automatically:

missing DBXREFs
too many DBXREFs
creates links in the ontology that are "corret", but not always helpful to a given question for a human - like BP "carbohydrate metabolism"is linked to all glycolysis MF annotations.

Moving forward, using the dbxrefs seems to be the way to go but we will have to go in manually to make them more complete.

(from chris' and Jen's talk)

So why should we even bother?

It will improve the GO because we need to be specific
It will help fill in annotation gaps - such as a MF "kinase activity" should be made to the BP "phoshorylation" - as well as provide ways to make suggestions new annotations.
It will allow better integration of pathway databases with GO.

Chris has been try to use Reactome to make mappings between function and process and has come across the following issues:

DBXREFs not necessarily equivalent
There are some reactions that always occur in a given process for a particular species and others that do not and this is more difficult to mine from reactome.

There are also gotchas from biology because there could be multiple variations for lysine biosynthesis that include mix-and-match reactions and variations of those reactions. A combinatorial explosion.

The proposal to deal with this:

When functions and process are closely related, like kinase and phosphorylation, can make a "part_of" annotation.
new relationship "sometimes part_of" when automated mappings are brought in which will avoid true path violations.

Function and process link discussion

David asked if every function should be a "part_of" process? In theory, there should be a link between each Molecular Function term to a Biological Process term. And there was general acceptance of this theory.

Eurie asked if MF enzyme terms made consistently in order to best make the links easy/consistent? Amelia pointed out there is also another issue that enzyme terms are usually forward and backward but we need separate terms. Harold also pointed out that we copied from EC but this may mean that two GO terms may exist solely on the basis of cofactors. So all these may contribute to issues in creating an automated mapping.

Suzi and Peter discussed that the definition of pathways between Go and Reactome are different, using apoptosis as an example. There are good examples where start and end may be different from organisms to an organism. Ingrid mentioned John Ingram (an experienced physiologist) said a metabolic pathway should begin and end with a central metabolite. Then pathways can feed into a common point that can then go to a central metabolite. But there is probably less consensus for models that are still being developed. All agree that a discussion needs to occur to work on coming to a common agreement.

Paul pointed out there we were discussing two extremes: an uncurated automated link and curated links. The curated links are the ultimate goal but there could be a compromise in the middle. The relationship linked between MF and BP go through KEGG, that evidence trail is documented. This is something better than "sometimes part_of". The concern with this is that changes other groups make need to be propagated.

There was a discussion about the impact on curation. With these inter-ontology links, you have to take the links in account as a true path rule. In addition, how much evidence do you need to make those annotations? That is why there is the "sometimes_part_of" but what if that pathway doesn't exist in your organism?

ACTION ITEMS

Add obvious part_of links, like MF "kinase" and BP "phosphorylation"; will be rolled out after regulates is released in Feb 2009
Try mining pathways for sometimes_part_of relationships using glycolysis, nucleotide metabolism, apoptosis first
Agree on beginnings, middles, and ends of pathways/processes between Reactome and GO
Examine impact on annotation priorities and implememntations
Can we source our relationships as well as our term definitions.
- (david: this is about pushing the work onto the ontology developers and not the annotators)
assign process to every molecular function.
deferred: co-annotation 'has function as part of this process'

New relationship type (David)

New relationships will be released to the public in Feb 2009. This is the first cross-ontology links between BP and MF. It will occur between the BP "regulation of catalytic activity" and MF "catalytic activity". Those functions that regulate function terms will get the regulates relationship.

One major consequence is that all groups have to take into account relationships. The BP "negative regulation of kinase activity" is part_of "kinase activity", but the slimming will make them "kinase activity". Need to be careful about this.

Michael was concerned about whether the meaning of "part_of" was being overloaded. David replied that we probably are but practically, it may not matter because the child term really cannot be part of the both parents at the same time.

We will have to make sure that GO tools support these links. In addition, we need to make sure that users who develop tools are aware of these changes. Jane emphasized that we couldn't do testing for all tools but the users need to test.

ACTION ITEM

Send out function process email again.
Release examples of relationship usage for software development.

Quality Control (Tanya)

Much of the information is on the wiki. For regulation terms, the reasoner looks at regulation terms and then at corresponding process terms, checks if the structures match or if relationships missing. These were all reviewed.

Ontology developers will continue to review Chris' reports - it's becoming part of the process of ontology development since it is part of OBO-edit.

OBO-Edit (Amina)

(has slides)

Priority should be testing and bug fixes. This version doesn't need more features but all the new features need to be tested, tested, tested.

Reports (Jane)

(has slides)

PAMGO

This is an ongoing process. Lots of "regulates" terms.

Organization and biogenesis of cellular components

(has slides)

All "organization & biogeneis" terms will be changed to "organization" with the proposed high level structure:

biogenesis
	organization
		biosynthesis/formation
		assembly
		modification/processing
	disassembly
		catabolism
maintenance

ACTION ITEM

Continue work on "organization and biogenesis" terms. Maybe biogenesis & organization should be switched at the higher level but this is up for discussion.

Signaling (Jen)

(has slides)

Is responding to the signal the same to the reception of the signal? Currently defined as within the realm of reception of the signal?

Future content meeting discussion

The discussion of signaling touches on every species. David pointed that some of these are huge issues - signaling alone can be roughly categorized into

g-protein coupled receptor signaling
calcium signaling
tyrosine kinase singaling
MAP kinase cascade

ACTION ITEMS

Pursue an ontology development meeting one or two
- Viral processes (Brenley, Kimberley, Candice, Michelle, Jane)
- GPCR (Pascale, David, ??)
Couple a GO meeting with major meetings on these topics
Investigate funding sources

Annotation checking by trigger file (Jen)

(has slides)

problem IEAs
- viral/bact ones should probably to be to host instead

Suzie: do we want all the groups submitting annotations run the triggers?
Judy: we can do a monthly run with the trigger file
Peter: Once it has run a few times, we can check for global issues from GA files.
Michael A: What will you do about the GOA annotations where there is a confilct
Emily: Can use to feed back to InterProt (for the InterProt to GO mappings) to update mappings because old mappings are causing problems.

ACTION ITEMS

remove sensu synonyms
Make GOA quickgo checking available to the public
write up for near future news letter

General Annotation Issues

Michelle: Evidence code ontology (ECO)

Michelle is taking it over.

Goals:

correct incosistencies in the eco with GO
eco exists as its own and includes things other than GO
GO pulls from the eco - uses a subset

Is ECO the responsibility of this community? MA says yes because we started it. We have not used it because we wanted to start out pretty easy. TAIR then wanted a much richer set of evidence codes. MA and Sue did a mapping. But when TAIR reports to GO, they collapse the evidence codes down. Those arguments are still valid. If curators were faced with more evidence codes, it would take longer. IDA could be expanded to a zillion codes.

Michael thought we should integrate GO evidence codes into the ECO because other people might use them and that GO should use a subset/slim of ECO (i.e. the ones that we are using the ones we use now.) Judy agreed and said if an individual MOD wants to make use of the granular codes, they can, but they must be mapped up to the higher-level codes used by the GO.

Pascale asked if we needed to use the the more granular terms adopted by GO (ISM, ISO, ISA) or if we could keep to the higher level terms. This was deemed fine--you should annotate to the degree of knowledge available and this might end up being to the more general EV code. Also this allows for not having to retrofit older annotations as brought up by Harold.

Suzi brought up there could be various slim sets for various projects - AmiGO, Ref Genome, etc. for display purposes in the interfaces.

The GOC would only set standards for the GOC accepted codes, not all codes. Each database could have use more if they wanted, but would have to convert it into the standard set for the GA file.

Maybe EXP would be better when there is no consensus on which evidence code should be made. This may prevent spinning it. For use of ref genome, maybe have to have additional standards available. There were concerns about overloading the EXP term and also concerns

Any evidence code that is in the ECO could be submitted to the GOC for adoption.

DECISION: we will use the ECO.

Peter: You might end up overloading the EXP term. Reactome is not protein-centric and does not annotate in a way that the literature can be parsed out later. hard to figure out what evidence is in some cases.
Rex: worried about accuracy. There is not universal agreement for which code to use for a given experiment. There is a lot of time spent debating which lower code to use and I would rather have people agree to EXP and spend more time annotating.
Suzi: People can use any ev code in the ECO if it were useful, then we could explore writing software to slim things up.

ACTION ITEMS

create an EV code ontology tracker
people can use the ECO in its entirety but they have to map up to the GO set of EV codes.

Separating annotation method from experimental method

We would have a way to say that an annotation is electronic, that we can use any evidence code. We have a way to indicate that an annotation has not been manually reviewed.
Chris' proposal: use ECO and make a cross-product between evidence code and method. evidence code has an ID, methodology has an ID, the cross-product would have an instantiated ID. Then don't need to make another column on the GAF.
- Implication is that IEA would go away.
Judy: You could use anything in combination
Suzi: There is no need to change the GAF.
Kara: There's agreeing on the concept and secondly the implementation.
Judy: IEAs are inferred.
Mike: If you use InterProt to infer function is that ISS?
- It also solves a lot of problems for high-throughput.
Harold: So this is mainly for HTP situation?
Many people: NO
Mike: This is to differentiate experimental or a prediction.
Donghui
Eurie: It splits off the evidence from the experiment happening and what we put in the annotation file. Did a curator review everything or did it just appear there.
- need to put in the annotation file--if we don't tag that certain experiments came from htp experiments, it becomes a circular argument from people doing RCA experiments.
Debby: The data indicates two localizations but the database didn't capture the conflict, is that the issue? Is there a marker on the paper that everything was dumped in without someone reviewing it?
Judy: That an experiment can either have high volume or low volume; doesn't like the term htp, it's not about the data but how we're treating the data.
Suzi: Problem is that we're conflating several different things into one.

what method is used?
was there human judgement involved?
you may want to know something about the volume

All these together says something about the quality of the evidence. If we build it into the ontology, it allows people to fileter based on their needs.

Michael: There is a fundemental difference between htp and individual experiments on a particular protein.
- What would happen to the existing annotations?
  - They would become oxymoranic
Suzie: IEA, if the method was sequence similarity, it becomes automated sequence similarity. ISS and IEA have the same method and the difference is the judgement used. In the long term we should build it into the ECO.
Emily: I don't see how htp annotations that are experimental would fit under an electronic tag.
Judy: two common IEA approaches

First pass through the InterProt, based on domains
look at the structure of the gene product

Mike: the majority of the annotations would stay the way they are (the cross-product would be manual) but there would be an additional ISS electronic.
IEA and ISS electronic are synonymous
No one seems against
Not electronic/manual, but curated/uncruated
??: How is the end user using these ev codes and annotations?
Mike: we have all sorts of users, some strip out the evidence codes. some
Debby: I don't think that all IEAs are based on sequence similarity. What about IEAs based on keywords?
Mike: Those become something else, not ISS. We throw away IEAs after 12 mos.
Michael: What happens to everything 'uncurated'? Do we throw those out as well?
Petra: Why can't we leave IEA and have a new flag?
Kara: We have all these evidence codes are describing the experiment was done, but we have IEA that describes how the annotation was done. And the proposal is to formally separate that concept out.
Emily: A nuclear fractionation is not going to be done every year. If you have a protein where there is no other method and this looks good to you, then you put them in. Talking to users, they want to know what is small scale versus what was done in a large-scale experiment.
Rex: The year limitation was for things like sequence.
Judy: Experiments are different than things that can be recomputed. We do have methods that look at hundreds of thousands of points that have restrictions on how they were done and we need a way to handle that differently than sequence sets that we were discarding.
Mike Tyers: It really comes down to a matter of confidence. Is it possible to put a confidence score?
Michael: if there were a question, than hopefully the curator would not annotate.
Suzi: The current proposal is extensible. It allows you to ask what are the attributes of the evidence? It gives you flexibility by separating from the attribute from the method.
Rex: Worried about the amount of time we spend? Practicality of how much time the curator spends be considered.
Pascale: We should assay the community to see if this would be useful.
Judy: We spend an inordinate amount of time dealing with IEA and ISS . The crucial work is the high level experimental data.

ACTION ITEM

Example cases of annotations and implementation into the ECO.
- Suzi, Michelle, Judy, Pascale, Emily, Eurie

PAMGO (Michelle/Candace)

Candance gives an overview of the new terms. Project is coming to an end - funding is coming to an end. New gene association files have been submitted.

Successes

PAMGO terms outside of PAMGO: viruses, c. albicans, p. falciparum, t. cruzi, t.brucei.

Issues

incorrect uses also.
there are a few terms where it is ambiguous whether or not the process is for the host or the virus side.

Future directions

fix virus terms
add comments
adopt more descriptive form for annotations.

Fixes

missing taxon ids for dual taxons
need a way to capture "acted_upon" annotations

Dual taxon IDs

still not displayed in AmiGO due to technical issues

ACTION ITEM

Check your annotations to terms under the 'symbiosis' and 'interaction with host' branches to make sure that there aren't any problems.

Cross products: Column 16 (Tanya)

Initially proposed in Jan 2007.

Reminder: this is extra information to combine multiple terms in a single annotation. These are GOIDs that we don't want to encode links in the ontology. If there is more than 1 localization, they can be piped and several different ontologies can be piped in the same row.

You are not restricted to one ontology in column 16
Column 16 is optional
Column 16 can also be used to identify a target i.e. regulation of transcription (Note that the current documentation states that column 16 is only for external ontologies.) or a chebi ID for a chemical when annotating "response to drug".

There were two proposed solutions on the table:

simple solution
expressive solution

No one had concerns about adding this column and a few people spoke up in favor of the expressive because it would allow for more information and no need to retrofit.

ACTION ITEM

go ahead with the implementation of the expressive model in column 16
- get more examples
- get the documentation together

Transitive Relationships in GO

Relationships in terms - why it's wrong to just slim terms.

The composition of is_a and part_of need to be taken into consideration for true path violations.

If you regulate a process, you regulate part of that process, not that whole process.

As you add more relationships, need to create these transitive closures.

And as you take these into consideration, the slimming can become more sophisticated.

judy: we need a tool

Look at action item from last meeting

push forward not dones
carry forward last pascale

20th GO Consortium Meeting Minutes

Contents

Ontology content development

Overview (Midori)

ACTION ITEMS

Theory and examples of function and process links (Harold, Jen)

Function and process link discussion

ACTION ITEMS

New relationship type (David)

ACTION ITEM

Quality Control (Tanya)

OBO-Edit (Amina)

Reports (Jane)

PAMGO

Organization and biogenesis of cellular components

ACTION ITEM

Signaling (Jen)

Future content meeting discussion

ACTION ITEMS

Annotation checking by trigger file (Jen)

ACTION ITEMS

General Annotation Issues

Michelle: Evidence code ontology (ECO)

ACTION ITEMS

Separating annotation method from experimental method

ACTION ITEM

PAMGO (Michelle/Candace)

ACTION ITEM

Cross products: Column 16 (Tanya)

ACTION ITEM

Transitive Relationships in GO

Look at action item from last meeting

Navigation menu