Understanding Genome-Wide Association Studies (GWAS)

I have written in the past on Genome-Wide Association Studies or GWAS. The promise of a more personalized, genetic understanding of disease has made this form of scientific investigation increasingly prominent. While we all had exposure to the workings of randomized controlled trials and observational studies, some of us may be of an age when GWAS was “not a thing.” JAMA has a helpful review, which you can find here. For those of you who want a quick summary read on.

GWAS demonstrate associative linkages between a condition under consideration and “genetic variations at known positions in the genome.” These variations involve an alteration in one nucleotide in a given genomic sequence – a single nucleotide polymorphism or SNP. When one of these SNPs occurs more frequently with a given condition, “it is considered to mark a region of the human genome that influences the risk of disease.” Disease is perhaps too strong a word; that is why I had chosen condition because GWAS studies could and have been applied to intelligence, finishing school, or the ability to read.

Some SNPs, at a physical distance from the gene, felt to be an effector for a condition or disease, can be co-inherited more frequently than expected on a random basis. This increased frequency is called linkage disequilibrium, and it is this disequilibrium in frequency, that is, the association term in GWAS – the association between a SNP and the candidate gene(s).

There are a vanishingly few diseases that can be attributed to a single genetic defect; most conditions involve many genes, each exerting a small effect on the aggregate. At the same time, there are hundreds of thousands of SNPs to consider making for a combinatorial and statistical nightmare. The traditional p-value of 0.05 allows for far too many false positives, so GWAS studies typically utilize a p-value of 0.00000005. Also, the small effect exerted by a disease-associated SNP requires large populations to be studied. That is why you see so many studies making use of national genomic registries like the UK Biobank, involving more than a hundred thousand participants.

Genotyping does find the SNPs characterizing the genome have been automated, one of the critical technologic advances making these studies possible. But even with automation, most known SNPs are not necessarily identified. Again, statistics provides some assistance in simplifying the combinatorial possibilities, using the technique of imputation. Simply, if we are looking at SNP1 and we know it is associated with SNP2 in 70% of the population, we may substitute that knowledge of the known frequency of SNP1 for the missing or in this case not sought after SNP2; e.g., SNP2 will be found in 70% of individuals with SNP1.

The outcome of GWAS is expressed as an odds ratio. “The odds of disease among individuals who have a specific allele [SNP] vs. the odds of disease among individuals who do not have the same allele [SNP]...” These predictive SNPs still offer very tiny effects, so to increase the effect size permitting use as a clinical predictor, many of these predictive SNPs are gathered together and weighted to provide a polygenic risk score. The aggregate polygenic risk score is then used to describe at-risk populations.

As with any scientific study, especially one using advanced technology and methodology, there are limitations. The primary difficulty with GWAS is statistical and is a result of the vast combinatorial space and the resulting minuscule effect of any single SNP. The reduction in acceptable p-values still allows false positives, and imputation of the presence of a SNP by its association with another introduces another layer of uncertainty. The aggregation of these tiny changes into a polygenic risk score also introduces additional bias in the weighting of each factor. Finally, any GWAS study only applies to the genetic characteristics of the participants, which fall along many lines, including race and ethnicity. GWAS are brittle; the results are not easily transferable to other populations. There is little value in a GWAS study from the UK Biobank in understanding or predicting disease in Asians.

 

Source: Genome-Wide Association Studies JAMA DOI: 10.1001/jama.2019.16479