Ref genome Annotation progress ideas (Retired)

Rex, Karen, Emily, Susan

Annotation Progress: June conference call

Information required for measuring annotation progress

  • How many of the human genes have orthologs identified ?
  • How many of these have papers associated (are published genes) ?
  • How many papers associated in total ?
  • How many papers have been considered for GO curation ?
  • How many papers provided GO terms ?
  • How many genes which have papers associated are considered complete/comprehensive ?

Suggestions from Susan

  • For genes that already have GO annotation a distinct part of the process is cleaning-up the existing NAS, TAS, dodgy ISS data. This can take me a considerable amount of time - is it worth including a check box for having done this?
  • To date, I have only entered 'complete' for genes where either all papers have been read or I am pretty sure no terms have been missed. Annotation status of other genes falls into three categories:

1. there are existing annotations which I have cleaned-up but no new papers have been curated yet

2. some new papers have been curated but it is clear that there are other terms to get if there was time

3. a substantial number of key terms are annotated - possibly all terms captured but still many papers unread

I think we should have at least one extra column to reflect such intermediate states.

  • This maybe obvious but it seems worth encourging people to fill in the spreadsheets at intermediate stages rather than waiting till a gene is complete. In the past, I waited till I finished a gene before filling in the ref numbers but this doesn't reflect the current state of progress.

