Klinkhammer, Hannah: Advanced polygenic prediction models via statistical boosting. - Bonn, 2025. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-81451
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-81451
@phdthesis{handle:20.500.11811/12883,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-81451,
author = {{Hannah Klinkhammer}},
title = {Advanced polygenic prediction models via statistical boosting},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = mar,
note = {In times of growing availability of large biobanks with extensive genetic data, polygenic prediction modeling has gained importance and aims at capturing an individual's genetic predisposition to specific, often complex, traits. In contrast to monogenic diseases, complex traits are typically characterized by a limited genetic signal that is distributed across many genetic loci and based on common variants exhibiting only low to medium effect sizes. Additionally, common variants in close proximity are often highly correlated (linkage disequilibrium), increasing the statistical complexity of polygenic prediction modeling.
The aim of this cumulative dissertation is to enable advanced statistical modeling of polygenic risk scores (PRS) based on individual-level genotype data from large cohort studies. The first work underlines the potential of PRS to partly explain incomplete penetrance in monogenic conditions by analyzing patients diagnosed with Lynch syndrome, a monogenic condition increasing the risk for colorectal cancer. Here, PRS showed a higher potential for risk stratification in individuals with a variant in moderate penetrance genes compared to individuals with an affected high penetrance gene. PRS are commonly based on univariate effect estimates from genome-wide association studies. In the second work of this dissertation, the new statistical boosting framework textit{snpboost} is introduced. The algorithm incorporates an additional batch-building step which substantially decreases the search space in each boosting iteration. By iteratively working on batches of variants, computational challenges are therefore solved, and multivariable modeling of PRS directly from individual-level genotype data via statistical boosting is made feasible for the first time. The third work in this dissertation extends the textit{snpboost} framework to be applicable not only to Gaussian and binary data but also to time-to-event data, count data and quantile regression.Finally, the last included work emphasizes the importance of a thorough performance assessment of PRS. In this context, a major challenge of PRS is addressed, namely a strongly decreased prediction performance in out-of-target data, e.g. individuals of different ancestry than the training population.
All research articles have been published in international peer-reviewed journals (see Publications 1-4).},
url = {https://hdl.handle.net/20.500.11811/12883}
}
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-81451,
author = {{Hannah Klinkhammer}},
title = {Advanced polygenic prediction models via statistical boosting},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = mar,
note = {In times of growing availability of large biobanks with extensive genetic data, polygenic prediction modeling has gained importance and aims at capturing an individual's genetic predisposition to specific, often complex, traits. In contrast to monogenic diseases, complex traits are typically characterized by a limited genetic signal that is distributed across many genetic loci and based on common variants exhibiting only low to medium effect sizes. Additionally, common variants in close proximity are often highly correlated (linkage disequilibrium), increasing the statistical complexity of polygenic prediction modeling.
The aim of this cumulative dissertation is to enable advanced statistical modeling of polygenic risk scores (PRS) based on individual-level genotype data from large cohort studies. The first work underlines the potential of PRS to partly explain incomplete penetrance in monogenic conditions by analyzing patients diagnosed with Lynch syndrome, a monogenic condition increasing the risk for colorectal cancer. Here, PRS showed a higher potential for risk stratification in individuals with a variant in moderate penetrance genes compared to individuals with an affected high penetrance gene. PRS are commonly based on univariate effect estimates from genome-wide association studies. In the second work of this dissertation, the new statistical boosting framework textit{snpboost} is introduced. The algorithm incorporates an additional batch-building step which substantially decreases the search space in each boosting iteration. By iteratively working on batches of variants, computational challenges are therefore solved, and multivariable modeling of PRS directly from individual-level genotype data via statistical boosting is made feasible for the first time. The third work in this dissertation extends the textit{snpboost} framework to be applicable not only to Gaussian and binary data but also to time-to-event data, count data and quantile regression.Finally, the last included work emphasizes the importance of a thorough performance assessment of PRS. In this context, a major challenge of PRS is addressed, namely a strongly decreased prediction performance in out-of-target data, e.g. individuals of different ancestry than the training population.
All research articles have been published in international peer-reviewed journals (see Publications 1-4).},
url = {https://hdl.handle.net/20.500.11811/12883}
}