Reference Genome progress report for 2010 (Archived)

From GO Wiki
Jump to navigation Jump to search


Publications and posters

  • Mi H, Dong Q, Muruganujan A, Gaudet P, Lewis S, Thomas PD. 2010. PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res 8(Database issue): D204-D210.
  • Gaudet P, Lee E, Livstone, M, Dolinski K, Lewis SE, Thomas P. Functional Gene Ontology Annotation across Species using PAINT. Poster presented at the ISMB meeting. Boston, July 2010.
  • Fisk D et al. (SGD), Title: "Annotation Across Species: SGD and the Reference Genome Project"; Model Organism to Human Biology Conference. June 2010

Internal communication

  • Conference calls are held every month to discuss annotation target prioritization, annotation issues, development of the PAINT software, requirements for uploading PAINT generated GAF files, and other issues relevant to the reference genome group.

Topic covered

  • General description of the GO – Ontology, annotation, tools and technical aspects
  • GO browsers: AmiGO and QuickGO
  • Comparison of methods of annotation propagation: PAINT, Compara, HAMAP
  • Discussion of annotation practices: 'binding', 'response to', 'downstream effects', 'regulation', high throughput data, annotation of protein complexes

Outcome

Curation Targets

As of November 2009, the selection of the curation targets is done from a 'systems' perspective, for example with respect to a biological pathway. The rationale is that if targets encompass a single biological phenomenon the annotation will be more accurate and more complete since the curators will be able to familiarize themselves with the subject. The advantages of this approach are :

  • facilitates coordination with ontology development
  • makes it easier to do the annotations because we're addressing a single general area of biology
  • makes it possible to solicit the help of experts to help review the annotations and ensure that nothing is missing.

The first project done using this approach was lung branching morphogenesis. This project was difficult because available experimental data does not allow to clearly infer how specific proteins individually influence the development of the lung (which is the level at which GO annotations are captured)

In the it next phase we have started to annotate the Wnt_signaling_Pathway. We have annotated 9 families of proteins implicated in this pathway, both at the primary level and using PAINT to propagate annotations across all the proteins from the 48 species currently in Panther.

Progress for primary annotations

  • 461 families have been annotated by MODs
  • 8,000 proteins annotated

PAINT: Software for annotation propagation

The software being developed for annotation propagation, PAINT, is now at version beta29. Many improvements have been made in terms of speed and functionality of the software. There are still some improvements to be made, but beta29 is a 'working version', in that it allows to produce valid GAF files that can be uploaded by the GO databases and the Model Organism Databases.


PAINT-based annotations

As of Nov 17, there are 17 families with annotations in the GO cvs directory: http://cvsweb.geneontology.org/cgi-bin/cvsweb.cgi/go/gene-associations/submission/paint/#dirlist

  • PTHR10000
    • Number of unique ancestors annotated: 3
    • Number of terms annotated to ancestors: 6 [ 1 BP, 2 CC, 3 MF ]
    • Number of unique proteins annotated: 44
    • Number of new protein annotations: 124 [ 28 BP, 27 CC, 69 MF ]
  • PTHR10003
    • Number of unique ancestors annotated: 6
    • Number of terms annotated to ancestors: 11 [ 3 BP, 4 CC, 4 MF ]
    • Number of unique proteins annotated: 69
    • Number of new protein annotations: 326 [ 98 BP, 61 CC, 167 MF ]
  • PTHR10046
    • Number of unique ancestors annotated: 5
    • Number of terms annotated to ancestors: 20 [ 9 BP, 4 CC, 7 MF ]
    • Number of unique proteins annotated: 92
    • Number of new protein annotations: 879 [ 312 BP, 151 CC, 416 MF ]
  • PTHR10073
    • Number of unique ancestors annotated: 7
    • Number of terms annotated to ancestors: 15 [ 4 BP, 9 CC, 2 MF ]
    • Number of unique proteins annotated: 76
    • Number of new protein annotations: 447 [ 153 BP, 141 CC, 153 MF ]
  • PTHR10150
    • Number of unique ancestors annotated: 2
    • Number of terms annotated to ancestors: 8 [ 5 BP, 1 CC, 2 MF ]
    • Number of unique proteins annotated: 25
    • Number of new protein annotations: 164 [ 95 BP, 23 CC, 46 MF ]
  • PTHR10202
    • Number of unique ancestors annotated: 3
    • Number of terms annotated to ancestors: 26 [ 7 BP, 17 CC, 2 MF ]
    • Number of unique proteins annotated: 26
    • Number of new protein annotations: 480 [ 121 BP, 328 CC, 31 MF ]
  • PTHR10845
    • Number of unique ancestors annotated: 4
    • Number of terms annotated to ancestors: 36 [ 25 BP, 7 CC, 4 MF ]
    • Number of unique proteins annotated: 270
    • Number of new protein annotations: 1101 [ 209 BP, 584 CC, 308 MF ]
  • PTHR11309
    • Number of unique ancestors annotated: 33
    • Number of terms annotated to ancestors: 167 [ 147 BP, 16 CC, 4 MF ]
    • Number of unique proteins annotated: 180
    • Number of new protein annotations: 3456 [ 2393 BP, 524 CC, 539 MF ]
  • PTHR11361
    • Number of unique ancestors annotated: 14
    • Number of terms annotated to ancestors: 56 [ 30 BP, 9 CC, 17 MF ]
    • Number of unique proteins annotated: 110
    • Number of new protein annotations: 1246 [ 564 BP, 207 CC, 475 MF ]
  • PTHR11447
    • Number of unique ancestors annotated: 4
    • Number of terms annotated to ancestors: 28 [ 14 BP, 6 CC, 8 MF ]
    • Number of unique proteins annotated: 29
    • Number of new protein annotations: 578 [ 262 BP, 109 CC, 207 MF ]
  • PTHR11829
    • Number of unique ancestors annotated: 84
    • Number of terms annotated to ancestors: 350 [ 316 BP, 9 CC, 25 MF ]
    • Number of unique proteins annotated: 524
    • Number of new protein annotations: 10865 [ 5752 BP, 599 CC, 4514 MF ]
  • PTHR12027
    • Number of unique ancestors annotated: 49
    • Number of terms annotated to ancestors: 229 [ 219 BP, 6 CC, 4 MF ]
    • Number of unique proteins annotated: 226
    • Number of new protein annotations: 3735 [ 2970 BP, 545 CC, 220 MF ]
  • PTHR16505
    • Number of unique ancestors annotated: 1
    • Number of terms annotated to ancestors: 7 [ 6 BP, 0 CC, 1 MF ]
    • Number of unique proteins annotated: 12
    • Number of new protein annotations: 90 [ 78 BP, 0 CC, 12 MF ]
  • PTHR21304
    • Number of unique ancestors annotated: 1
    • Number of terms annotated to ancestors: 1 [ 0 BP, 1 CC, 0 MF ]
    • Number of unique proteins annotated: 16
    • Number of new protein annotations: 16 [ 0 BP, 16 CC, 0 MF ]
  • PTHR22573
    • Number of unique ancestors annotated: 13
    • Number of terms annotated to ancestors: 46 [ 29 BP, 12 CC, 5 MF ]
    • Number of unique proteins annotated: 196
    • Number of new protein annotations: 1708 [ 1121 BP, 308 CC, 279 MF ]
  • PTHR23315
    • Number of unique ancestors annotated: 6
    • Number of terms annotated to ancestors: 87 [ 64 BP, 15 CC, 8 MF ]
    • Number of unique proteins annotated: 28
    • Number of new protein annotations: 1261 [ 784 BP, 294 CC, 183 MF ]
  • PTHR24221
    • Number of unique ancestors annotated: 19
    • Number of terms annotated to ancestors: 42 [ 14 BP, 17 CC, 11 MF ]
    • Number of unique proteins annotated: 251
    • Number of new protein annotations: 1379 [ 435 BP, 498 CC, 446 MF ]
  • Total number of families annotated: 17
    • Total number of unique ancestors annotated: 254
    • Total number of terms annotated to ancestors: 1135 [ 893 BP, 135 CC, 107 MF ]
    • Total number of unique proteins annotated: 2174
    • Total number of new protein annotations: 27855 [ 15375 BP, 4415 CC, 8065 MF ]

Visualizing PAINT annotations with Pantree

Pantree is a new website live since October 2010 that has been developed by Paul Thomas' group and that allows to view the annotations to the families done with PAINT. http://pantree.org/

Visualizing Reference Genome annotations in AmiGO

http://amigo.berkeleybop.org/cgi-bin/amigo/amigo?mode=homolset_summary&session_id=


Annotation reports

We now generate reports on the annotation status of PAINT families. Those reports indicate how many species contain homologs in a given family, how many members of each family exist in every species, how many members have experimental annotations associated with them, the date a member of the family was last annotated, etc. The reports can be viewed at: http://amigo-sven.princeton.edu/cgi-bin/amigo/phylotree?

Electronic annotation jamborees


Annotation camp

The 3rd Gene Ontology’s Annotation Camp was held from June 16-18 2010 at the Centre Medical Universitaire (CMU) Geneva, Switzerland. The Gene Ontology project (GO) provides a set of controlled vocabularies for use in annotation of gene products (http://geneontology.org/). Members from several model organism databases were represented, for a total of 63 attendees, including 40 from the SIB and 23 external delegates. This annotation camp aimed to update and refine the skills of GO biocurators, including the Swiss-Prot curation team. The major themes of the meeting covered processes difficult to represent in the Gene Ontology such as regulation, responses to stimulus, and protein complexes. The goal is to improve annotation consistency for GO users to have high quality data to support their work.

New GO curators

Swiss-Prot annotators are now doing GO annotation. Emily Dimmer and Rachael Huntley trained 34 annotators during the past year.

New Tree curators

In addition to Mike Livstone, 4 other curators have been trained to do PAINT curation: Rama Balakrishnan, Varsha Khodiyar, and (Nov 19) Li Ni and Dmitry Sitnikov