20th GO Consortium Meeting Minutes: Difference between revisions

From GO Wiki
Jump to navigation Jump to search
No edit summary
Line 27: Line 27:
Moving forward, using the dbxrefs seems to be the way to go but we will have to go in manually to make them more complete.
Moving forward, using the dbxrefs seems to be the way to go but we will have to go in manually to make them more complete.


==Theory and examples of function and process (Jen)==
(from chris' and Jen's talk)
(from chris' and Jen's talk)



Revision as of 16:53, 21 October 2008

Ontology content development

Overview (Midori)

Most of the report is on the wiki. A lot has been accomplished. Highlights include the following:

  • closed more SF items them opened since last meeting (~200)
  • peptidase reorg is finished: After SLC meeting, MEROPS database curators were contacted and they made recommendataions. Those recs have been acted upon and reorg is finished.

There still are ~200 open items. All those that are more than 6 months old have been assigned but maybe those should be reviewed to see if the priority should be changed. Also, David mentioned that many of the items are being taken care of the in large chunks with ontology changes, such as "biogenesis and organization" terms. Some are stuck because no consensus can be reached.

The majority of the ontology content section will talk about future work and the links that will be made between function and process ontologies.

ACTION ITEMS

  • SF items that cannot be closed due to lack of consensus should be put onto a wiki page so they can be resolved at an upcoming GOC meeting. Midori said to email the editors and they'll take care of it.

Biochemical pathway function and process links (Harold)

(get his slides)

Many groups have been working on systems for links trying to see how it works since it does represent biology. Harold showed examples of cross-products using biochemical pathways, defining a start and end, selecting paths, and using common resources. Done manually, the links looked OK but labor intensive. Could it be done automatically?

Problems with doing it automatically:

  • missing DBXREFs
  • too many DBXREFs
  • creates links in the ontology that are "corret", but not always helpful to a given question for a human - like BP "carbohydrate metabolism"is linked to all glycolysis MF annotations.

Moving forward, using the dbxrefs seems to be the way to go but we will have to go in manually to make them more complete.

(from chris' and Jen's talk)

So why should we even bother?

  • It will improve the GO because we need to be specific
  • It will help fill in annotation gaps - such as a MF "kinase activity" should be made to the BP "phoshorylation" - as well as provide ways to make suggestions new annotations.
  • It will allow better integration of pathway databases with GO.

Chris has been try to use Reactome to make mappings between function and process and has come across the following issues:

  • DBXREFs not necessarily equivalent
  • There are some reactions that always occur in a given process for a particular species and others that do not and this is more difficult to mine from reactome.

There are also gotchas from biology because there could be multiple variations for lysine biosynthesis that include mix-and-match reactions and variations of those reactions. A combinatorial explosion.

The proposal to deal with this:

  • When functions and process are closely related, like kinase and phosphorylation, can make a "part_of" annotation.
  • new relationship "sometimes part_of" when automated mappings are brought in which will avoid true path violations.

Function and process link discussion

David asked if every function a "part_of" process? In theory, there should be a link between each Molecular Function term to a Biological Process term. And there was general acceptance of this theory.

  • peter: counter-example ??
  • suzi: never really done annotations to conjuntive annotations

Eurie asked if MF enzyme terms made consistently in order to best make the links easy/consistent? Amelia pointed out there is also another issue that enzyme terms are usually forward and backward but we need separate terms. Harold also pointed out that we copied from EC but this may mean that two GO terms may exist solely on the basis of cofactors. So all these may contribute to issues in creating an automated mapping.

Suzi and Peter discussed that the definition of pathways between Go and Reactome are different, using apoptosis as an example. There are good examples where start and end may be different from organisms to an organism.

Ingrid mentioned John Ingram is an experienced physiologist. His idea of a metabolic pathway should begin and end with a central metabolite. There are pathways that feed into a common point that can then go to a central metabolite.

But All both agreed that a discussion needs to occur.


  • Peter: manual curation will be necessary ; also, legacy clean-up problems; may be hard to get mutually ok; For metabolites, there is more consensus than something like apoptosis. We are also going to rediscover the sensu problem.
  • let's explore how good can common start and ends can be created in the GO
  • judy: we need a process to work towards a shared start and end, but respect the dfferences; we should just get the ones where we can get the overlaps first
  • paul: is there a compromise argreement for the interim? saw two extremes (some has part and hash part with sublasses); external layer between function and process, start with a sampling that are more specific;
  • rex: when thay make changes, how do they get propagated so they don't break our system?
  • eurie: Annotations with links between function and process--sometimes you just don't have the evidence to make the annotation without breaking true path rules. It becomes an annotation issue when true path rules have to be considered.
  • Jen: That's why we are asking for sometimes_part_of
  • Kimberly: Would we have to use sometimes_part in all of these cases and couldn't we do better in cases where we have the information.
  • judy: what descisions do we need to make?

ACTION ITEMS

  • Add obvious part_of links, like MF "kinase" and BP "phosphorylation"; will be rolled out after regulates is released in Feb 2009
  • Try mining pathways for sometimes_part_of relationships using glycolysis, nucleotide metabolism, apoptosis first
  • Agree on beginnings, middles, and ends of pathways/processes between Reactome and GO
  • Examine impact on annotation priorities and implememntations
  • Can we source our relationships as well as our term definitions.
    • (david: this is about pushing the work onto the ontology developers and not the annotators)
  • assign process to every molecular function.
  • deferred: co-annotation 'has function as part of this process'

New relationship type (David)

  • there will be problems with slimming if they don't think about relationships
  • ACTION: software, release examples of relationship usage
  • michael: are we overloading part_of
    • david: yes we are, but it probably doesn't matter.

Terms in MF that describe fns that regulate other fns - e.g. inhibitor activity

TS regulator activity - describes fns that regulate processes

Feb 2009 - regulates relationships going into the db full tilt

  • big impact on SLIMMING activities
  • simple slimming is not a good idea
  • will have to enforce community awareness of relationships
  • test case for whether inter-ontology links will break software or not
  • will provide backups for those not up to date with relationships


- Michael - are we overloading part-of?

  • David: We've looked at everything in the BP that have more than one part_of parent. Gut feeling is 'yes', but practical feeling is 'it doesn't matter'. i.e. development of an anatomical structure.


Quality Control (Tanya)

(info on wiki)

Regulation terms: reasoner looks at regulation terms and then at corresponding process terms, checks if the structures match or if relationships missing

  • Emily: GO tools needing to adapt with the proliferation of the ontologies, it's in the OBO edit. Also, we shouldn't endorse tools that do not appropriately slim.
  • Emily/Jane: We should continuously send out notices but it's the responsibility of the tool creator to take the initiative to test their tools.
  • continue to review chris' reports--becoming part of the process

ACTION ITEM

  • send out function process email again
  • we now have systematic ways of determining right, not just ad hoc

OBO-Edit (Amina)

(has slides)

Priority should be testing and bug fixes. This version doesn't need more features but all the new features need to be tested, tested, tested.

Reports (Jane)

(has slides)

PAMGO

  • This is an ongoing process.

Organization and biogenesis of cellular components

(has slides)

ACTION: continue work on org and bio terms

Signaling (Jen)

(has slides)

Future content meeting discussion

  • brenley: volunteer for virus terms
  • judy: maybe infetctious diease group?
  • midori: touches on every species
  • david: there should be specific venues; some of these are huges issues;
    • focus: g-protein coupled receptors, calcium signaling, tyrosine kinase singaling, MAP kinase cascade

ACTION ITEMS

  • pursue an ontology development meeting one or two
    • viral processes (Brenley, Kimberley, Candice, Michelle, Jane)
    • GPCR (Pascale, David, ??)
  • Go to meetings on these topics and ask for experts to join meeting
  • Investigate funding sources

Annotation checking by trigger file (Jen)

(has slides)

  • problem IEAs
    • viral/bact ones should probably to be to host instead
  • Suzie: do we want all the groups submitting annotations run the triggers?
  • Judy: we can do a monthly run with the trigger file
  • Peter: Once it has run a few times, we can check for global issues from GA files.
  • Michael A: What will you do about the GOA annotations where there is a confilct
  • Emily: Can use to feed back to InterProt (for the InterProt to GO mappings) to update mappings because old mappings are causing problems.

ACTION ITEMS

  • remove sensu synonyms
  • Make GOA quickgo checking available to the public
  • write up for near future news letter

General Annotation Issues

Michelle: Evidence code ontology (ECO)

  • includes things other than GO
    • curation of museum collections
    • morpho
    • etc
  • we want to corrct inconsistancies
  • want to make sure that GO is a subet
  • there will be a tracker


  • Mike: is ECO a responsibility of this community?
  • Michael: We started it.
  • Mike: If we started it, then why don't we use it
  • Michael: kept the list short
  1. so EV codes can be easily distinguished
  2. reasonably easy for the annotators to use it
  • TAIR wanted a much richer set of EV codes. Sue and Michael did a mapping to GO evidence codes.
  • If annotators were faced with 1500 evidence codes it would slow down annotation and wouldn't add a whole lot to the GO.
  • IEAs are not ISS's
  • We should integrate GO evidence codes into the ECO b/c other people might use it. And that GO should use a subset/slim of ECO (i.e. the ones that we are using the ones we use now.)

Pascale: Is it the case, b/c more hierarchical that we use the higher terms or to use the most granular. Judy: As far as GO is concerned, we should stay with the high level. Each mod can do more specific sub-curation. The power of the high-level EV codes is that we can bring together all these different communities. Pascale: Do we need to go as granular as we can within the EV codes that are currently used by GO? Judy: as far a GO is concerned, should stay high-level; but communities should have mappables;

  • Suzi: Like any annotations, do it to the degree of knowledge available. You might want to go to the higher term. Secondly, we might want to use an EV code slim for AmiGO. should do to maximal degree available; different AmiGO slim and Ref Genome slim? when evcodes are organized, we can have x-prod modifiers
  • Rex: Why would you want to do that?
  • Mike/Suzi: b/c it's an interface. Just for display purposes
  • Suzi: When we have the evidence codes in the ontology we can append an experimental method to it.
  • Harold: Expansion of more granular descriptions--there would be a problem of two annotations of pre and post
  • Eurie: In terms of setting annotation standards for ev codes. Would we set standards just for just the GO set or for the entire ECO set.
  • Mike: Annotation standards would be just for the GO evidence codes. You can use more if you want to, but you have to convert it into the standard set for the GA file.
  • Emily: For the reference genome group, we have decided to use IDA, IMP, IGI, IPI, IEP, and the parent EXP is for groups that are not in the ref genome group but annotate, like Reactome.
  • Rex: Some cases there is a lack of agreement about which evidence code to apply in a situation would allow
  • harold: lost information from past annotations
  • Peter: You might end up overloading the EXP term. Reactome is not protein-centric and does not annotate in a way that the literature can be parsed out later. hard to figure out what evidence is in some cases.
  • Rex: worried about accuracy. There is not universal agreement for which code to use for a given experiment. There is a lot of time spent debating which lower code to use and I would rather have people agree to EXP and spend more time annotating.
  • Suzi: People can use any ev code in the ECO if it were useful, then we could explore writing software to slim things up.

ACTION ITEMS

  • create an EV code ontology tracker
  • people can use the ECO in its entirety but they have to map up to the GO set of EV codes.

Separating annotation method from experimental method

  • We would have a way to say that an annotation is electronic, that we can use any evidence code.
  • Judy: You could use anything in combination
  • Suzi: There is no need to change the GAF.
  • Kara: There's agreeing on the concept and secondly the implementation.
  • Judy: IEAs are inferred.
  • Mike: If you use InterProt to infer function is that ISS?
    • It also solves a lot of problems for high-throughput.
  • Harold: So this is mainly for HTP/

NO

  • Mike: This is to differentiate experimental or a prediction.
  • Donghui
  • Eurie: It splits off the evidence from the experiment happening and what we put in the annotation file. Did a curator review everything or did it just appear there.
    • need to put in the annotation file--if we don't tag that certain experiments came from htp experiments, it becomes a circular argument from people doing RCA experiments.
  • Debby: The data indicates two localizations but the database didn't capture the conflict, is that the issue? Is there a marker on the paper that everything was dumped in without someone reviewing it?
  • Judy: That an experiment can either have high volume or low volume; doesn't like the term htp, it's not about the data but how we're treating the data.
  • Suzi: Problem is that we're conflating several different things into one.
  1. what method is used?
  2. was there human judgement involved?
  3. you may want to know something about the volume

All these together says something about the quality of the evidence. If we build it into the ontology, it allows people to fileter based on their needs.

  • Michael: There is a fundemental difference between htp and individual experiments on a particular protein.
    • What would happen to the existing annotations?
      • They would become oxymoranic
  • Suzie: IEA, if the method was sequence similarity, it becomes automated sequence similarity. ISS and IEA have the same method and the difference is the judgement used. In the long term we should build it into the ECO.
  • Emily: I don't see how htp annotations that are experimental would fit under an electronic tag.
  • Judy: two common IEA approaches
  1. First pass through the InterProt, based on domains
  2. look at the structure of the gene product
  • Mike: the majority of the annotations would stay the way they are (the cross-product would be manual) but there would be an additional ISS electronic.
  • IEA and ISS electronic are synonymous
  • No one seems against
  • Not electronic/manual, but curated/uncruated
  • ??: How is the end user using these ev codes and annotations?
  • Mike: we have all sorts of users, some strip out the evidence codes. some
  • Debby: I don't think that all IEAs are based on sequence similarity. What about IEAs based on keywords?
  • Mike: Those become something else, not ISS. We throw away IEAs after 12 mos.
  • Michael: What happens to everything 'uncurated'? Do we throw those out as well?
  • Petra: Why can't we leave IEA and have a new flag?
  • Kara: We have all these evidence codes are describing the experiment was done, but we have IEA that describes how the annotation was done. And the proposal is to formally separate that concept out.
  • Emily: A nuclear fractionation is not going to be done every year. If you have a protein where there is no other method and this looks good to you, then you put them in. Talking to users, they want to know what is small scale versus what was done in a large-scale experiment.
  • Rex: The year limitation was for things like sequence.
  • Judy: Experiments are different than things that can be recomputed. We do have methods that look at hundreds of thousands of points that have restrictions on how they were done and we need a way to handle that differently than sequence sets that we were discarding.
  • Mike Tyers: It really comes down to a matter of confidence. Is it possible to put a confidence score?
  • Michael: if there were a question, than hopefully the curator would not annotate.
  • Suzi: The current proposal is extensible. It allows you to ask what are the attributes of the evidence? It gives you flexibility by separating from the attribute from the method.
  • Rex: Worried about the amount of time we spend? Practicality of how much time the curator spends be considered.
  • Pascale: We should assay the community to see if this would be useful.
  • Judy: We spend an inordinate amount of time dealing with IEA and ISS . The crucial work is the high level experimental data.

ACTION ITEM

  • Example cases of annotations and implementation into the ECO.
    • Suzi, Michelle, Judy, Pascale, Emily, Eurie

PAMGO (Michelle/Candace)

  • Judy: with the metabalome projects, is there anything that the GO community can do or are you just going to use the GO?
  • Michelle: the latter
  • Jane: using the triggers to find places in the ontology that have issues?
  • dual taxon IDs
    • still not displayed in AmiGO due to technical issues

ACTION ITEM

  • Check your annotations to terms under the 'symbiosis' and 'response to host' terms to make sure that there aren't any problems.

Cross products: Column 16 (Tanya)

  • Proposed solutions
    • simple solution
    • expressive solution
  • There are examples of both in the wiki and it hasn't been decided which to use
  • No one had concerns about adding this column
  • A few people spoke up in favor of the expressive because it would allow for more information and no need to retrofit.
  • not restricted to one ontology in column 16
  • column 16 is optional
  • column can also be used to identify a target i.e. regulation of transcription

ACTION ITEM

  • go ahead with the implementation of the expressive model in column 16
    • get more examples
    • get the documentation together

Transitive Relationships in GO

  • judy: we need a tool

Look at action item from last meeting

  • push forward not dones
  • carry forward last pascale