, 2009 and Jakobsson et al., 2008). Standard GWAS approaches do not work so well in African populations (Teo et al., 2010). One explanation this website for the failure of GWAS applied to MD might be that the causative variants, or markers sufficiently
close to them, have not been genotyped on the available arrays. In fact, due to the blocks of linkage disequilibrium, in non-African populations GWAS is remarkably effective at detecting a large fraction of common variants of reasonable effect size (odds ratios greater than 1.2) that contribute to complex traits, even though a very small fraction of the total amount of sequence variation segregating in a population is actually genotyped. To illustrate this, Figure 1 shows the results of simulations that compare GWAS carried out using an Affymetrix 500K genotyping array, with the results from using all the variants in HapMap (Frazer et al., 2007). Even this relatively sparse array (current platforms interrogate millions of variants) has power of 82% (for a sample size of 9,000) to detect a locus with an odds ratio of ≥1.2, compared to 88% with the complete set of SNPs (9,240 is the largest discovery sample size used in GWAS of MD [Ripke et al., 2013b]). In other words, differences in coverage between chips do not translate into big differences in power. Furthermore, imputation (Howie et al., 2009) using the very high density of variants available from
the 1000 Genomes Project Anti-diabetic Compound Library datasheet (Abecasis et al., 2010), has further extended the scope of genotyping arrays to interrogate millions Thymidine kinase of ungenotyped variants. In short, failure of GWAS to detect common variants (MAF > 5%) conferring risk to MD is unlikely to be due to insufficient information about these variants from genotyping arrays. The most likely explanation for the failure of GWAS for MD is that studies have been underpowered to detect the causative loci (Wray et al., 2012). While GWAS coverage of common variants is good, GWAS requires large sample size in order to obtain adequate power to detect variants of small effect (odds ratios less than 1.2). In the following sections, we treat with common variants and the
power of GWAS (and candidate gene studies) to find them. We turn later to the detection of rare variants of larger effect. Figure 1 demonstrates the nonlinear relationship between sample size and effect size for common variants. To detect loci with an odds ratio of 1.1 or less, sample sizes in the tens of thousands will be required (note that this depends on the prevalence of the disease; in the following discussions, we assume that MD has a prevalence of 10%). Table 1 shows that the largest GWAS for MD used 9,240 cases and 9,519 controls (Ripke et al., 2013b). Figure 1 shows that such a sample has ∼90% power to detect loci with an odds ratio of ≥1.2; it will detect effects of this magnitude or greater at more than 93% of all known common variants.