Final year BSc research project entitled 'Identifying interactions between risk factors for Type 1 Diabetes'
Type 1 diabetes (T1D) is recognised to result from a complex interplay between environmental and genetic risk factors. Accurately predicting T1D is important to avoid severe complications at diagnosis and enable precision medicine. Current single-variable models are limited in their predictive power (Fig.1). Hence, exploring interactions between risk factors could extract more understanding of T1D risk and improve prediction models.
Use linear and non-linear classifiers followed by statistical tests to gain information about the presence and strength of interactions between risk factors.
- Linear classifier (logistic regression) with a multiplicative linear interaction term. Generate a matrix of interaction term p-values.
- Non-linear classifier (gradient boosting). Generate a matrix of H-statistic values. (Friedman, J. H., & Popescu, B.(2008). Predictive Learning Via Rule Ensembles. The Annals of Applied Statistics, Vol. 2, No. 3, 916–954, DOI: 10.1214/07-AOAS148.)
Non-linear interactions were identified and explored. No significant linear interactions were found.
- Linear interactions: Matrix of p-values of the logistic regression interaction term, characterising a linear interaction.
- No p-values were significant when applying a significance level of 0.05 with Bonferroni correction. Values enclosed by blue rectangles were significant before correction and so were explored by PDPs. The leading diagonal is the p-value of a single variable logistic regression, showing only four variables were significant on their own.
- Non-linear interactions: A matrix of H-statistic values for the non-linear model
- Higher H-statistic values indicate stronger interaction. Many interactions were identified. Due to the nature of the H-statistic, investigating variance of the output of the partial dependence, this could indicate any type of interaction.
A novel result is that there is interaction between variables that are not significant on their own, namely weight and diabetic sibling as well as maternal and paternal ages. See figures below.