4.1 Related work

Although classical GWAS have limitations that prevent a full understanding of the heritability of genetic and/or multifactorial diseases, there are nevertheless ways of overcoming these limitations to some degree. For instance, it is possible to take into account the structure of the data in the hypothesis testing procedure. As an illustration, (Meinshausen 2008) proposed a hierarchical testing approach which considers the influence of clusters of highly correlated variables rather than individual variables. The statistical power of this method to detect relevant variables at single SNP level was comparable to that of the Bonferroni-Holm procedure (Holm 1979b), but the detection rate was much higher for small clusters, and it increased further at coarser levels of resolution.

In the broad family of linear models, (Listgarten et al. 2013) introduced a likelihood ratio-based set test that accounts for confounding structure. The model is based on the linear mixed model and uses two random effects, one to capture the set association signal and one to capture confounders. They demonstrate a control of type I error as well as an improved power over more traditionally used score test.

Other methods focus on multiple linear regression either by taking into account the linkage disequilibrium within the genes to improve power (Yoo et al. 2016) or by clustering variants with weak association around known loci to increase the percentage of variance explained in complex traits (Paré, Asma, and Deng 2015).

Finally, other approaches will focus on the aggregation of summary statistics of single SNP within a same gene with for instance the data-driven aggregation of summary statistics described in (Kwak and Pan 2016) or the procedures of \(p\)-value combination in (Petersen et al. 2013). In the cited articles, the methods are used on SNP located in coding region (or extended intronic region in (Petersen et al. 2013)) but can be extended to any set of SNP as long as we pre-specified a set of variants within a region. However, the power for each test remains dependent of the true disease model. Furthermore, this kind of approaches may also lose statistical power in comparison to single-variant-based tests when only a very small number of the variants in a gene are associated with the trait, or when many variants have no effect or causal variants are low-frequency variants (Lee et al. 2014).

References

Holm, Sture. 1979b. “A Simple Sequentially Rejective Multiple Test Procedure.” Scandinavian Journal of Statistics 6 (2): 65–70.

Kwak, Il-Youp, and Wei Pan. 2016. “Adaptive Gene- and Pathway-Trait Association Testing with GWAS Summary Statistics.” Bioinformatics 32: 1178–84. https://doi.org/10.1093/bioinformatics/btv719.

Lee, S., G. R. Abecasis, M. Boehnke, and X. Lin. 2014. “Rare-Variant Association Analysis: Study Designs and Statistical Tests.” American Journal of Human Genetics 95 (1): 5–23.

Listgarten, J., C. Lippert, E. Y. Kang, J. Xiang, C. M. Kadie, and D. Heckerman. 2013. “A Powerful and Efficient Set Test for Genetic Markers That Handles Confounders.” Bioinformatics 29 (12): 1526–33.

Meinshausen, N. 2008. “Hierarchical Testing of Variable Importance.” Biometrika 95 (2): 265–78.

Paré, Guillaume, Senay Asma, and Wei Q. Deng. 2015. “Contribution of Large Region Joint Associations to Complex Traits Genetics.” PLOS Genetics 11. https://doi.org/10.1371/journal.pgen.1005103.

Petersen, Ashley, Carolina Alvarez, Scott DeClaire, and Nathan L. Tintle. 2013. “Assessing Methods for Assigning SNPs to Genes in Gene-Based Tests of Association Using Common Variants.” PLOS ONE 8. https://doi.org/10.1371/journal.pone.0062161.

Yoo, Yun Joo, Lei Sun, Julia G. Poirier, Andrew D. Paterson, and Shelley B. Bull. 2016. “Multiple Linear Combination (MLC) Regression Tests for Common Variants Adapted to Linkage Disequilibrium Structure: Yoo et Al.” Genetic Epidemiology 41. https://doi.org/10.1002/gepi.22024.