Orthology discussion page (Retired)

From GO Wiki
Revision as of 12:24, 11 May 2007 by Kchris (talk | contribs)

Jump to: navigation, search

This is the place for general discussions of methods, problems or ideas regarding general principles for establishing orthology between reference genome genes and the human disease gene targets.

Specific discussion of gene specific issues should be directed toward the gene specific pages. A link will be added here as soon as the pages are established.

ZFIN Orthology Determination Method

We always use the same methods as outlined here:

1. Check to see if orthology has already been established between a zebrafish gene and the human gene by searching in ZFIN.

2. If there is no established zebrafish ortholog, the human sequence is used to search zebrafish mRNA, Vega and Ensembl transcripts and protein sequences for potential orthologs. If there are several zebrafish sequences that are candidates for being the ortholog, reciprocal blasts of the zebrafish sequences against human sequences are used to order them. The best matches are then analyzed for conserved synteny with the human gene.

The current version of the zebrafish assembly at Ensembl is used to determine the location of the zebrafish gene. After the zebrafish gene has been localized, the flanking regions around the gene are analyzed for other orthologous genes between zebrafish and human chromosomes. The presence of conserved synteny is used as evidence to confirm orthology and the human gene is assigned as the ortholog of the zebrafish gene in ZFIN. If necessary, the zebrafish gene nomenclature in ZFIN is updated.

In cases, where sequence analyses and synteny do not provide clear evidence to distinguish between two or more zebrafish genes, orthology is not established. This is also the case for human genes that do not match any zebrafish cDNA or EST sequences but have a sequence match in the zebrafish genome. Genscan or FGENESH identifiers are instead provided as identfiers for putative zebrafish orthologs.

TAIR Orthology Determination Method

We use YOGY and maintain a separate spreadsheet of results for each method (analysis not done included) for each human gene. If an Arabidopsis gene appears in more than one analysis, we consider it an Arabidopsis ortholog. Arabidopsis genes that only occur in one analysis are not considered orthologs.

dictyBase Orthology Determination Method

1. Check YOGY for orthologs (if human name(s) are not recognized in YOGY search HGNC, UniProt or even Google).

2. If there is an ortholog in YOGY (Dicty is included in two methods: Inparanoid and OrthoMLC) we confirm ortology with reciprocal blast and compare domains in InterPro and/or Pfam. We have the general guidelines that an ortholog should be at least 30% identical over 80% of the length of the protein. We also routinely use TMHMM and SignalP if the human protein contains such structures. This helps to firmly determine if there is a single Dicty ortholog.

3. In case there is no ortholog in YOGY, we blast the human protein sequence in dictyBase and change parameters like E-value and/or turning filtering off if necessary (e.g. for very short proteins such as DNAJC19). If we identify a potential ortholog this way, we proceed with our analysis as described in 2.

4. If Dicty has one or more genes that are just similar, e.g. conservation is only over a large domain, we do not consider this an ortholog. Depending on the degree of similarity we might mention this in our free-text description on each gene page.

SGD "Orthology" Determination Method