Perspectives

The works presented in this thesis are the result of a reflection on ways to improve GWAS studies through the creation of new data-driven methodologies. Still, the possible contributions to the field of GWAS brought by the development of new statistical methods are not limited to those mentioned in this manuscript and can fall into a number of categories depending on their objectives. To conclude, we will therefore suggest some avenues of research not mentioned so far but worthwhile to be explored in future works.

At first, we can mention methods designed to better modelled population structure and relatedness between individuals in a sample during association analyses such as the works on linear mixed models in (Listgarten et al. 2012; Segura et al. 2012; Kang et al. 2010) or the methods for estimating and partitioning genetic (co)variance (Finucane et al. 2015; Yang et al. 2010).

In another fashion, methods combining classical statistical approaches with Machine Learning are of interest for exploratory purposes as in (Mieth et al. 2016) where multiple hypothesis tests are combined with support vector machine (SVM) to increase statistical power. Similarly, for purely predictive purposes, several machine learning methods such as random forest (Geurst, Botta, and Louppe 2014), classification-regression trees (CRT) (Maciukiewicz et al. 2018) or even Deep Learning (Neural Network) (Fergus et al. 2018) are also worthwhile considering in GWAS.

At last, the discovery of causal pathways between genomes and molecular traits such as gene expression, DNA methylation, or metabolites is of great importance to unravel cause and consequence in genetic epidemiology. The combination of sequence variation with molecular phenotypes, disease data and environmental covariates with novel analytical methods such as Mendelian randomization (Davey Smith and Ebrahim 2003; Zhu et al. 2018) or causal Bayesian networks as in (Rau, Jaffrézic, and Nuel 2013) have great potential in this respect.

References

Davey Smith, George, and Shah Ebrahim. 2003. “€˜Mendelian Randomization’: Can Genetic Epidemiology Contribute to Understanding Environmental Determinants of Disease?” International Journal of Epidemiology 32 (1): 1–22.

Fergus, Paul, Casimiro Curbelo Montanez, Basma Abdulaimma, Paulo Lisboa, and Carl Chalmers. 2018. “Utilising Deep Learning and Genome Wide Association Studies for Epistatic-Driven Preterm Birth Classification in African-American Women.” arXiv Preprint arXiv:1801.02977.

Finucane, Hilary K, Brendan Bulik-Sullivan, Alexander Gusev, Gosia Trynka, Yakir Reshef, Po-Ru Loh, Verneri Anttila, et al. 2015. “Partitioning Heritability by Functional Annotation Using Genome-Wide Association Summary Statistics.” Nature Genetics 47 (11): 1228.

Geurst, P., V. Botta, and G. Louppe. 2014. “Exploiting SNP Correlations Within Random Forest for Genome-Wide Association Studies.” PLoS ONE 9 (4).

Kang, Hyun Min, Jae Hoon Sul, Susan K Service, Noah A Zaitlen, Sit-yee Kong, Nelson B Freimer, Chiara Sabatti, Eleazar Eskin, and others. 2010. “Variance Component Model to Account for Sample Structure in Genome-Wide Association Studies.” Nature Genetics 42 (4): 348.

Listgarten, Jennifer, Christoph Lippert, Carl M. Kadie, Robert I. Davidson, Eleazar Eskin, and David Heckerman. 2012. “Improved Linear Mixed Models for Genome-Wide Association Studies.” Nature Methods 9 (6): 525–26. https://doi.org/10.1038/nmeth.2037.

Maciukiewicz, Malgorzata, Victoria S Marshe, Anne-Christin Hauschild, Jane A Foster, Susan Rotzinger, James L Kennedy, Sidney H Kennedy, Daniel J Müller, and Joseph Geraci. 2018. “GWAS-Based Machine Learning Approach to Predict Duloxetine Response in Major Depressive Disorder.” Journal of Psychiatric Research 99: 62–68.

Mieth, Bettina, Marius Kloft, Juan Antonio Rodrı́guez, Sören Sonnenburg, Robin Vobruba, Carlos Morcillo-Suárez, Xavier Farré, et al. 2016. “Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-Wide Association Studies.” Scientific Reports 6: 36671.

Rau, Andrea, Florence Jaffrézic, and Grégory Nuel. 2013. “Joint Estimation of Causal Effects from Observational and Intervention Gene Expression Data.” BMC Systems Biology 7 (1): 111.

Segura, Vincent, Bjarni J Vilhjálmsson, Alexander Platt, Arthur Korte, Ümit Seren, Quan Long, and Magnus Nordborg. 2012. “An Efficient Multi-Locus Mixed-Model Approach for Genome-Wide Association Studies in Structured Populations.” Nature Genetics 44 (7): 825.

Yang, Jian, Beben Benyamin, Brian P McEvoy, Scott Gordon, Anjali K Henders, Dale R Nyholt, Pamela A Madden, et al. 2010. “Common Snps Explain a Large Proportion of the Heritability for Human Height.” Nature Genetics 42 (7): 565.

Zhu, Zhihong, Zhili Zheng, Futao Zhang, Yang Wu, Maciej Trzaskowski, Robert Maier, Matthew R Robinson, et al. 2018. “Causal Associations Between Risk Factors and Common Diseases Inferred from Gwas Summary Data.” Nature Communications 9 (1): 224.