Abstract

Since the last decade, the rapid advances in genotyping technologies have changed the way genes involved in mendelian disorders and complex diseases are mapped, moving from candidate genes approaches to linkage disequilibrium mapping. In this context, Genome-Wide Associations Studies (GWAS) aim at identifying genetic markers implied in the expression of complex disease, those occurring at different frequencies between unrelated samples of affected individuals and unaffected controls. These studies exploit the fact that it is easier to establish, from the general population, large cohorts of affected individuals sharing a genetic risk factor for a complex disease than within individual families, as it is the case in traditional linkage analysis.

From a statistical point of view, the standard approach in GWAS is based on hypothesis testing, with affected individuals being tested against healthy individuals at one or more markers. However, classical testing schemes are subject to false positives, that is markers that are falsely identified as significant. One way around this problem is to apply a correction on the p-values obtained from the tests, increasing in return the risk of missing true associations that have only a small effect on the phenotype, which is usually the case in GWAS.

Although GWAS have been successful in the identification of genetic variants associated with complex multifactorial diseases (Crohn’s disease, diabetes I and II, coronary artery disease,…) only a small proportion of the phenotypic variations expected from classical family studies have been explained. This missing heritability may have multiple causes amongst the following: strong correlations between genetic variants, population structure, epistasis (gene by gene interactions), disease associated with rare variants,

The main objectives of this thesis are thus to develop new methodologies that can face part of the limitations mentioned above. More specifically we developed two new approaches: the first one is a block-wise approach for GWAS analysis which leverages the correlation structure among the genomic variants to improve statistical power in the context of univariate hypothesis testing while the second focuses on the detection of interactions between groups of metagenomic and genetic markers to better understand the complex relationship between environment and genome in the expression of a given phenotype.