Advanced polygenic prediction models via statistical boosting

Klinkhammer, Hannah

Volltext

Dokument öffnen (6.2MB)

Autor

Klinkhammer, Hannah

ORCID

https://orcid.org/0000-0003-3752-1275

Art der Hochschulschrift

Dissertation

Prüfungsdatum

18.02.2025

Datum der Veröffentlichung

06.03.2025

Erstgutachter

Krawitz, Peter M.

Zweitgutachter

Engel, Christoph

Grad-verleihende Institutionen

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadaten

Zur Langanzeige

Zitierbare Links

Handle: https://hdl.handle.net/20.500.11811/12883
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-81451

Inhalt

In times of growing availability of large biobanks with extensive genetic data, polygenic prediction modeling has gained importance and aims at capturing an individual's genetic predisposition to specific, often complex, traits. In contrast to monogenic diseases, complex traits are typically characterized by a limited genetic signal that is distributed across many genetic loci and based on common variants exhibiting only low to medium effect sizes. Additionally, common variants in close proximity are often highly correlated (linkage disequilibrium), increasing the statistical complexity of polygenic prediction modeling.
The aim of this cumulative dissertation is to enable advanced statistical modeling of polygenic risk scores (PRS) based on individual-level genotype data from large cohort studies. The first work underlines the potential of PRS to partly explain incomplete penetrance in monogenic conditions by analyzing patients diagnosed with Lynch syndrome, a monogenic condition increasing the risk for colorectal cancer. Here, PRS showed a higher potential for risk stratification in individuals with a variant in moderate penetrance genes compared to individuals with an affected high penetrance gene. PRS are commonly based on univariate effect estimates from genome-wide association studies. In the second work of this dissertation, the new statistical boosting framework textit{snpboost} is introduced. The algorithm incorporates an additional batch-building step which substantially decreases the search space in each boosting iteration. By iteratively working on batches of variants, computational challenges are therefore solved, and multivariable modeling of PRS directly from individual-level genotype data via statistical boosting is made feasible for the first time. The third work in this dissertation extends the textit{snpboost} framework to be applicable not only to Gaussian and binary data but also to time-to-event data, count data and quantile regression.Finally, the last included work emphasizes the importance of a thorough performance assessment of PRS. In this context, a major challenge of PRS is addressed, namely a strongly decreased prediction performance in out-of-target data, e.g. individuals of different ancestry than the training population.
All research articles have been published in international peer-reviewed journals (see Publications 1-4).

Klassifikation (DDC)

310 Allgemeine Statistiken

570 Biowissenschaften, Biologie

610 Medizin, Gesundheit

Zugehörige Publikation(en)

https://doi.org/10.1136/jmg-2023-109344
https://doi.org/10.3389/fgene.2022.1076440
https://doi.org/10.1002/sim.10249
https://doi.org/10.1186/s12920-024-01905-8

Zitiervorschlag
BibTeX

Klinkhammer, Hannah: Advanced polygenic prediction models via statistical boosting. - Bonn, 2025. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-81451

@phdthesis{handle:20.500.11811/12883,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-81451,
author = {{Hannah Klinkhammer}},
title = {Advanced polygenic prediction models via statistical boosting},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = mar,
note = {In times of growing availability of large biobanks with extensive genetic data, polygenic prediction modeling has gained importance and aims at capturing an individual's genetic predisposition to specific, often complex, traits. In contrast to monogenic diseases, complex traits are typically characterized by a limited genetic signal that is distributed across many genetic loci and based on common variants exhibiting only low to medium effect sizes. Additionally, common variants in close proximity are often highly correlated (linkage disequilibrium), increasing the statistical complexity of polygenic prediction modeling.
The aim of this cumulative dissertation is to enable advanced statistical modeling of polygenic risk scores (PRS) based on individual-level genotype data from large cohort studies. The first work underlines the potential of PRS to partly explain incomplete penetrance in monogenic conditions by analyzing patients diagnosed with Lynch syndrome, a monogenic condition increasing the risk for colorectal cancer. Here, PRS showed a higher potential for risk stratification in individuals with a variant in moderate penetrance genes compared to individuals with an affected high penetrance gene. PRS are commonly based on univariate effect estimates from genome-wide association studies. In the second work of this dissertation, the new statistical boosting framework textit{snpboost} is introduced. The algorithm incorporates an additional batch-building step which substantially decreases the search space in each boosting iteration. By iteratively working on batches of variants, computational challenges are therefore solved, and multivariable modeling of PRS directly from individual-level genotype data via statistical boosting is made feasible for the first time. The third work in this dissertation extends the textit{snpboost} framework to be applicable not only to Gaussian and binary data but also to time-to-event data, count data and quantile regression.Finally, the last included work emphasizes the importance of a thorough performance assessment of PRS. In this context, a major challenge of PRS is addressed, namely a strongly decreased prediction performance in out-of-target data, e.g. individuals of different ancestry than the training population.
All research articles have been published in international peer-reviewed journals (see Publications 1-4).},
url = {https://hdl.handle.net/20.500.11811/12883}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: