13 JULY 2010 RefGen Phone Conference (Archived)

From GO Wiki
Jump to: navigation, search

Present

Varsha: BHF/UCL

Pascale: dictyBase

David: MGI

Tanya: TAIR

Doug: ZFIN

Kimberly: WormBase

Ranjana: WormBase

Susan: FlyBase

Rachael: GOA

Julie: SGD

Stan: RGD

Jim: Ecoli

Yasmin: GOA

Seth: BBOP

wnt project

Please add when annotation is done to the Wnt wiki page.

Ontology Issues

Canonical vs Non-Canonical Wnt Signaling

Varsha: Split into canonical or non-canonical?

Stan: Yes, authors usually make that distinction in the papers.

Molecular Function for Wnts

Varsha: email that Ruth sent to Ref Genome list

Ruth wanted to have a better way of describing the molecular function of Wnts, e.g. Wnt ligand activity?

What do people think about that?

What functions would be suggested?

Some type of signaling activity other than just binding to the receptor.

Susan: Is Ruth striving to describe the class of protein rather than the actual function? Does the molecular function extend beyond the binding?

Becky and Susan responded to the accompanying SourceForge item, felt that binding captured the function sufficiently.

SourceForge Items

Quite a few SF items related to Wnt project; curators, please keep an eye on these and comment, as needed.

New Annotation Schedule

Can people meet the Friday deadline?

Susan: Waiting for new terms, some other projects at FlyBase preclude GAF file upload until next week; still working on Wnt1

How long does it take to do the PAINT annotations? Is it okay to wait a few days?

Mike: Giving himself a week to curate each family, one day for MF and CC, three days for BP.

Susan: The annotations are already there, but working on cleaning them up. But other concern is new terms, that haven't been processed yet. Is it worth it to wait another week to start the PAINT annotations?

Mike: PAINT anticipates a two-week lag for annotations to make it to the server.

Susan: That should be fine.

Varsha: We'll stick to the two-week schedule and see how it goes.

GO camp action items

  • Send final guidelines to Rachael/Pascale for inclusion on the GO wiki

Much of this work is still in progress, but there will be wiki pages on annotation wrt each of these items.

Binding group action items

  • Unresolved issues to be discussed by binding group

1. Annotation of 'NOT' binding a specific protein: new GO term or column 16 (consider IntAct guidelines on this)?

2. Automate annotation to specific binding term from known functions of protein, eg transcription factor binding, based on evidence that protein is transcription factor, or domain implied? Or not create this type of term?

3. Transferring cross species information by ISS and inclusion of non in-vivo targets in column 8 or 16.

4. How specific to make substrate/product target information?

5. Will CHEBI IDs in function ontology propagate to process terms?

6. Existing GO to follow new has_part relationships implying substrate binding

  • Unresolved issues to be discussed by other groups:

1. Incorporation of IMEX data being discussed

2. Col 16 relationship ontology (has_input=substrate)

Response group action items

1. Update definition of response to terms to indicate that we are capturing mediators (wording needs to be worked out)

Protein complexes group action items

1. Long term goal is to annotate complexes; details and requirements need to be clarified.

Pascale: Discussion at ISMB amongst IntAct and PRO deciding what they will do. It will be possible to annotate to protein complexes once they exist.

Downstream Process group action items

1. What is the process term for a specific transcription factor? (i.e. 'transcription' or 'regulation of transcription'?) ACTION: transcription ontology revision

2. Define the start and end of signaling processes. ACTION: signaling working group

3. Is a ligand part of the pathway? Can it also regulate the pathway? Is there a difference between intra- and inter-cellular pathways regarding the ligand?

4. Some MODs keep legacy annotations (i.e. correct annotations to downstream processes), but some prefer to remove them, is this a problem? ACTION: all

5. Form a working group to look into phenotype/development/IMP issues. How should we annotate to development terms?

6. Regarding the survey question 2, whether to annotate ubiquitin ligases to regulation of histone methylation, Val will give reasons why she would like to annotate to regulation of histone methylation. The ontology may need altering to reflect the step-by-step nature of this pathway. ACTION: Val/Sylvain/Ontology editors

Quality control checks

Rachael: Will need to be sent to Chris to be fleshed out a bit more.

1. High level ‘response to’ terms should not directly be used for annotation

2. Avoid annotations to GO: MF by IPI (except for ‘protein binding’ and children)

3. Check for less-granular and more-granular annotation from the same path (soft check)

PAINT-generated GAF files


GAF Files

Mike: Pascale put a link about GAF files on the agenda.

Click on the link, there are six directories.

Each directory has ~8-9 files. Most important is the .gaf file and the .txt file contains the reasoning and questions regarding curation. emails have also been sent out; please address questions when you have a chance.

Also, Sven is trying to get the files from CVS onto the family display in AmiGO labs; they may appear and disappear randomly.

Pascale: Clarify?

Mike: As Sven develops the pages, this may be a moving target.

Ultimately the plan is to have the files displayed from the family page; but this work is in production.

Seth: One thing....taking snapshots from AmiGO Sven and being moved to AmiGO labs. Should be more stable, but a bit behind. Hopefully this will give a more stable user interface.

Seth will put a sample URL on the wiki page.

Ranjana: Is this about having a more user friendly display of everything in the GAF?

Mike: Yes; want to make sure annotations questions are more easily seen and can address questions in the .txt file.

Ranjana: Looking at a GAF to review PAINT annotations is not user friendly. Do you want the MOD curators to review all of the annotations or just the ones for which there are questions?

Mike: Questions that I'm writing pertain primarily to the existing experimental annotations - clarifications, etc. As for reviewing the lines in the GAF file that pertain to an organism - that will have to be an internal decision for each database.

Rachael: Question about the GAFs. In the human GAFs, there were ENSEMBL and UniProt IDs? Can this be all UniProtKB IDs?

This is an issue for Ed and Paul.

Mike will send an email.

David: Are the annotation files going to be available on an ftp site? concatenated? would be much easier to handle.

Mike: Refer to Ed and Suzi.

Pascale: People have requested one GAF file as opposed to several. Would that be simpler?

Ranjana: One file of C. elegans genes new stuff, one file of everything.

David: The issue with that is that if things get dropped, you wouldn't know it. MGI will completely strip out all of annotations and then get all of them again. That's why having one central location for the file is best.

Ranjana: New files by date; files also by organism.

David: If the file is just new annotations each week, then that won't work with MGI.

Doug: Can the date in the GAF file be helpful in that regard? Use the annotation date to get this?

Jim: Use taxon IDs to filter. Making separate files for each organism could be messy.

Tanya: Then we still want one giant file with all of the PAINT annotations. Filter by tax ID. Or have one giant file with all of the annotations for each organism. Or have a giant file plus a file with just the new annotations.

Ranjana: Just a question of what the scripts will do. Is it a problem to create different formats?

Pascale: Everyone should write to the RefGenome list with their suggestions and then the PAINT developers and GO developers can discuss what is the best thing to do. Having every family in a separate directory is to complicated. We should try to find something relatively simple to propose.

  • Could PAINT provide their data similarly to how UniProtKB provides their IEA annotations? Groups either already have or are working towards ways of incorporating this data into their databases and files so consistency would avoid MODs having to create new scripts for each new external data source.

Tanya: Please check the minutes to make sure what you've proposed has been recorded accurately :-).

Doug: Files dumped from PAINT are in GAF2.0, in column 10 (protein, gene, etc.) - needs to be updated to GAF2.0 vocabulary.

Mike will make a note to get new and old files consistent with GAF2.0.

Susan: What references are being used? Idea was to have one reference per PANTHER group and until those were available, there was a generic GO Ref being used. Current files are a mixture.

Mike: They're not all showing up as GO_REF: PTHRnnnnn

Susan: Will need to make new references in FB for each PTHR family. Thought a generic GO_REF would be used until the PTHR family Refs did exist.

Julie: There is a generic GO_Ref on GO, this would have to be substituted for the specific references.

David: GO will need to provide GO_Refs a la PubMed so that groups can import these into their DBs.

Pascale: PAINT curators are aware of this; will be put on the to do list. Don't have a dedicated PAINT developer at the moment.

External Links

  • AmiGO Labs [2]
  • Sven's works snapshot at AmiGO Labs [3]