Defining CV-associated gene list
Assembly of the Cardiovascular GO Annotation Initiative Gene List
The Cardiovascular GO Annotation Initiative gene list (comprising 4054 genes) has been assembled by merging 3 existing cardiovascular associated gene lists (identified by ITMAT, by Bentham and Bhattacharya and identified using Gene Ontology) and adding a further 170 genes identified as relevant to cardiovascular systems by Anna Dominiczak, Pete Scambler, Philippa Talmud and Brendan Keeting. In January 2009 this list was extended to 4177 genes following the inclusion of 21 genes described as cardiovascular disease candidate biomarkers by Anderson (2005, PMID: 15611012) and 100 genes recently added to the ITMAT Consortia SNP array.
A gene association file containing all available GO annotations for this full list of cardiovascular-associated genes is now released monthly and can be downloaded. Information is available about the format of the download file.
A) The Gene Ontology Gene List
The Cardiovascular Gene Ontology gene list (approximately 2500 genes) was compiled by the Cardiovascular GO Annotation Initiative by extracting human genes associated with cardiovascular-related GO processes using Ensembl BioMart in April 2007 (with pseudogenes and untraceable genes removed).
The following GO terms were used:
- GO:0007507 heart development 102 genes
- GO:0048738 cardiac muscle development 2 genes
- GO:0008015 circulation 137 genes
- GO:0050878 regulation of body fluids 105 genes
- GO:0001944 vasculature development 160 genes
- GO:0005739 Mitochondrion 830 genes
- GO:0005578 extracellular matrix (sensu Metazoa) 288 genes
- GO:0042060 Wound healing 113 genes
- GO:0006979 response to oxidative stress 64 genes
- GO:0016055 Wnt receptor signaling pathway 117 genes
- GO:0006520 Amino acid metabolism 268 genes
- GO:0006979 response to oxidative stress 64 genes
- GO:0050817 coagulation 90 genes
- GO:0006629 lipid metabolism 686 genes
- GO:0006936 muscle contraction genes 160 genes
- GO:0048771 tissue remodeling 102 genes
- GO:0051145 smooth muscle cell differentiation 8 genes
- GO:0007517 muscle development 170 genes
- GO:0042692 muscle cell differentiation 48 genes
- GO:0048659 smooth muscle cell proliferation 7 genes
B) The ITMAT Gene List
The ITMAT Gene List (approximately 2200 genes) was compiled by the ITMAT/Broad/CARE (IBC CHIP) Vascular Disease 50k SNP Array Consortia.
The ITMAT Consortia has developed a Vascular Disease 50k SNP Array as collaborative effort between investigators from the Institute of Translational Medicine and Therapeutics (UPenn), the Broad Institute, SeattleSNPs, the CARE project, the DREAM project (Gerstein HC, et al. Diabetologia (2004) 47:1519-1527) and the Wellcome Trust, Oxford. It aims to comprehensively assess the genetic diversity within pathways underpinning primary and secondary vascular disease processes such as blood pressure, insulin resistance, metabolic disorders, dyslipidemia and inflammation.
Processes for ITMAT Gene selection 1. Disease gene candidates identified through seven key Genome Wide Association Studies:
- The Wellcome Trust Case Control Consortia (WTCCC) Hypertension study (in press).
- WTCCC Type2 Diabetes study (Zeggini E, et al. Science 2007 316,1336-41).
- WTCCC coronary heart disease study (Samani NJ, et al. N Engl J Med. 2007 357, 443-53).
- The FUSION Type2 Diabetes study (Scott LJ, et al. Science 2007 316, 1341-5).
- The Broad-Novartis-Lund T2D Diabetes Genetics Initiative (Saxena R, et al. Science 2007 316, 1331-6).
- GWAS SNPs from the WTCCC Rheumtoid Arthritis, Crohn Disease and Type-1 Diabetes (The Wellcome Trust Case Control Consortium Nature 2007 447, 661-678).
- FHS Offspring Cohort 100K GWAS association data with body mass index, fasting blood glucose, plasma cholesterol, QT-interval and hypertension (Herbert A, et al. Science 2006 312 279-83).
2. Atherosclerosis susceptibility genes identified through a number of unpublished mouse atherosclerosis expression quantitative trait loci (eQTL) data. Genes predicted to be causal for atherosclerotic lesion size in genetic crosses of mice with differing susceptibility to atherosclerosis were identified (Wang SS, et al. Circ Res. 2007 Aug 3;101 e11-30) based on:
- the correlation between transcript levels and lesion size
- the overlap of expression and atherosclerosis QTLs
- the likelihood of a causal rather than independent or reactive relationship based on Bayesian modeling.
3. Key pathway associated genes identified from:
- Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.ad.jp/kegg/pathway.html)
- Protein ANalysis THrough Evolutionary Relationships (PANTHER, http://www.pantherdb.org/)
- BioCarta (http://www.biocarta.com)
- Ingenuity Pathway Analysis (http://www.ingenuity.com/)
These tools were employed to collate additional genes from key pathways including lipid metabolism, thrombogenesis, circulation and gas exchange, insulin resistance, metabolism, and inflammation, oxidative stress and apoptosis.
4. A systematic analysis of over 2400 CVD-related publications from PubMed for cardiovascular disease association studies and from studies of genetic manipulation in animal models. Greater emphasis and consideration was given to studies with optimal sample size, data quality and strength of the described associations.
5. Input was also solicited from the collaborators within the consortium. Approximately 2400 genes were placed in a MySQL database with key information displayed for each respective gene:
- the number of SNPs required to tag the four HapMap representatives populations at various minor allele frequencies MAFs and r2 thresholds
- SymAtlas® expression profiling for over 70 specific human tissues and cell-types20
- links to NCBI, OMIM and other reference database
- public resequencing information
- Jackson Lab Mouse and other phenotypic data.
A voting system built into this database facilitated the consensus from over fifty consortium investigators for ranking of the genes proposed for inclusion on the array. Genes were categorized into three groups, based on voting by the participating investigators:
- Group 1: 450 genes and regions thought highly likely to be of functional significance, including established mediators of vascular disease and findings from key GWAS.
- Group 2: 1370 genes and regions that may be involved in CVD.
- Group 3: 300 comprised mainly of the larger genes which were of minor interest to the consortium investigators.
C) Heart Development
Bentham and Bhattacharya created a list of 282 congenital heart disease candidate genes by identifying genes with a major role in mouse heart development. These genes were identified as resulting in abnormal cardiac morphology or development resembling CHD when knocked out, using the Mouse Genome Informatics database.
D) Literature Review
List of cardiovascular relevent genes identified through literature review and unpublished data. • Candidate CV disease genes (50 genes) suggested by Philippa Talmud, based on type II diabetes studies and NPHSII studies, and by Brendan Keeting, based on 2 recent studies (Katherisan S et al. Nat Genet. 2008 Feb;40(2):189-97 and Willer CJ et al. Nat Genet. 2008 Feb;40(2):161-9). • List of cardiovascular relevent genes (120 genes) suggested by Anna Dominiczak and Pete Scambler.