Statistical learning for multivariate distributional regression with complex dependencies

Strömer, Annika Lisa

dc.contributor.advisor	Mayr, Andreas
dc.contributor.author	Strömer, Annika Lisa
dc.date.accessioned	2025-12-05T16:56:11Z
dc.date.available	2025-12-05T16:56:11Z
dc.date.issued	05.12.2025
dc.identifier.uri	https://hdl.handle.net/20.500.11811/13728
dc.description.abstract	Large, complex datasets are becoming increasingly important in biomedical research. Such datasets typically feature a high number of variables per subject, multiple outcomes and complex dependency structures. While they provide new opportunities to examine scientific questions in greater detail, they also pose major statistical challenges. Addressing these challenges requires advanced methods that can handle high dimensionality, capture dependencies between correlated outcomes and provide interpretable results. This cumulative dissertation develops statistical frameworks for multivariate distributional regression and variable selection techniques, enabling the analysis of complex biomedical data while balancing flexibility, interpretability and efficiency. It comprises five publications covering methodological advances and applications in diverse biomedical contexts. The first project demonstrates the value of advanced multivariate modeling for uncovering clinically relevant patterns in complex longitudinal data. Using latent class linear mixed models (LCMMs), unobserved patient subgroups are identified with distinct five-year trajectories in weight, depressive symptoms, eating disorder psychopathology and health-related quality of life (HRQoL) after obesity surgery. The results show that physical and psychological changes can evolve differently over time and may vary in sustainability, underscoring the need for joint models that capture both interdependencies and heterogeneity. The second project develops a model-based boosting approach for multivariate distributional regression within the framework of generalized additive models for location, scale and shape (GAMLSS). This method enables simultaneous modeling of all distribution parameters – including dependence parameters – of arbitrary parametric multivariate outcomes as functions of covariates. It incorporates data-driven variable selection and scales to high-dimensional settings where the number of covariates exceeds the number of observations (p > n). Building on this, the third project tackles the issue of dependent censoring in survival analysis, a challenging scenario where the common assumption of independent censoring does not hold. In such cases, censoring may be related to the patient's health status; for instance, patients in poorer condition may withdraw from a study earlier. The work proposes a novel model-based boosting method using distributional copula regression to jointly model the marginal distributions of event and censoring times as well as their dependence, as functions of covariates. The fourth and fifth papers address the challenge of improving interpretability in model-based boosting, particularly for high-dimensional biomedical data. While boosting provides flexibility, it may result in overly complex models by including covariates with negligible importance. The fourth paper proposes a deselection approach for univariate (distributional) regression that removes irrelevant predictors with only a minor impact on the prediction of the model, yielding simpler and more interpretable models without compromising predictive performance. The fifth paper extends this approach to distributional copula regression, enabling not only the removal of variables with minor importance but also the determination of whether specific parameters require covariate effects. This controls model complexity and enhances interpretability. This dissertation includes five research articles published in peer-reviewed international journals (Publication A - E).	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject.ddc	310 Allgemeine Statistiken
dc.title	Statistical learning for multivariate distributional regression with complex dependencies
dc.type	Dissertation oder Habilitation
dc.identifier.doi	https://doi.org/10.48565/bonndoc-732
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-86720
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	8672
ulbbnediss.date.accepted	27.11.2025
ulbbnediss.institute	Medizinische Fakultät / Institute : Institut für Medizinische Biometrie, Informatik und Epidemiologie (IMBIE)
ulbbnediss.fakultaet	Medizinische Fakultät
dc.contributor.coReferee	Klein, Nadja
ulbbnediss.contributor.orcid	https://orcid.org/0000-0002-1284-3318

Files in this item

Name:: 8672.pdf
Size:: 30.7MB
Format:: PDF

View/Open

This item appears in the following Collection(s)

E-Dissertationen (2142)

Show simple item record

The following license files are associated with this item: