Advances in tree-based regression for modeling biomedical data
Advances in tree-based regression for modeling biomedical data

dc.contributor.advisor | Schmid, Matthias | |
dc.contributor.author | Spuck, Nikolai | |
dc.date.accessioned | 2025-07-08T05:26:48Z | |
dc.date.available | 2025-07-08T05:26:48Z | |
dc.date.issued | 08.07.2025 | |
dc.identifier.uri | https://hdl.handle.net/20.500.11811/13194 | |
dc.description.abstract | Health-related research questions require methods that can deal with the growing complexity and dimensionality of biomedical data sets. A popular alternative to common parametric regression approaches are tree-based models, which recursively partition the data using binary splits to identify subgroups with similar values of an outcome variable of interest. The splitting rules (i.e., the splitting variables and corresponding split points) are selected in a data-driven way. Therefore, the data-driven tree building inherently performs variable selection and is able to detect and include relevant interactions even in high-dimensional data settings. In addition, tree-based models are easily accessible to practitioners due to their intuitive graphical representation. This cumulative dissertation consists of four projects that aim to extend the class of tree-based models with a focus on application to biomedical research questions. In this vein, novel flexible tree-based approaches for modeling different types of biomedical data and a method for measuring statistical uncertainty and conducting inference on parameters from tree-based models are introduced. The first two projects focus on discrete time-to-event outcomes, which are common in biomedical research, for example, in observational studies, where the possible occurrence of an event of interest is only recorded at certain follow-up times. In the first project, a flexible approach for tree-based modeling of discrete time-to-event outcomes is proposed. In the second project project, a tree-based model for discrete time-to-event analysis is used to identify relevant risk factors for a prolonged length of stay in hospital for patients suffering from oral squamous cell carcinoma. The third project deals with the analysis of clustered data, where observations come in clusters of units, and the heterogeneity between observations from different units needs to be accounted for. A tree-based approach for modeling the effects of the covariates and the heterogeneity between the units with an application to quality of life in older adults is presented. The fourth project addresses the construction of confidence intervals for parameters from tree-based models. In particular, parameters of a tree-structured varying coefficient model are considered. Classical asymptotic normal distribution-based approaches for statistical inference on tree-structured varying coefficients are invalid as they neglect the uncertainty induced by the data-driven tree building, which constitutes a so-called selective inference problem. To address this selective inference problem, a parametric bootstrap-based method for constructing confidence intervals for tree-structured varying coefficients is introduced. The performance of the methods proposed in the four projects is assessed in simulation studies, and applications to real-world data are considered. Three research articles have been published in peer-reviewed international journals (Sections 3.1, 3.2, and 3.4). In addition, an unpublished manuscript submitted to Advances in Data Analysis and Classification and available on arXiv is included in this dissertation (Section 3.3). | en |
dc.language.iso | eng | |
dc.rights | In Copyright | |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
dc.subject.ddc | 310 Allgemeine Statistiken | |
dc.title | Advances in tree-based regression for modeling biomedical data | |
dc.type | Dissertation oder Habilitation | |
dc.publisher.name | Universitäts- und Landesbibliothek Bonn | |
dc.publisher.location | Bonn | |
dc.rights.accessRights | openAccess | |
dc.identifier.urn | https://nbn-resolving.org/urn:nbn:de:hbz:5-83634 | |
dc.relation.arxiv | arXiv:2501.12787 | |
dc.relation.doi | https://doi.org/10.1007/s11222-022-10196-x | |
dc.relation.doi | https://doi.org/10.1016/j.bjoms.2023.09.004 | |
dc.relation.doi | https://doi.org/10.1016/j.csda.2025.108142 | |
ulbbn.pubtype | Erstveröffentlichung | |
ulbbnediss.affiliation.name | Rheinische Friedrich-Wilhelms-Universität Bonn | |
ulbbnediss.affiliation.location | Bonn | |
ulbbnediss.thesis.level | Dissertation | |
ulbbnediss.dissID | 8363 | |
ulbbnediss.date.accepted | 16.06.2025 | |
ulbbnediss.institute | Medizinische Fakultät / Institute : Institut für Medizinische Biometrie, Informatik und Epidemiologie (IMBIE) | |
ulbbnediss.fakultaet | Medizinische Fakultät | |
dc.contributor.coReferee | Groll, Andreas | |
ulbbnediss.contributor.orcid | https://orcid.org/0000-0001-6345-9634 |
Dateien zu dieser Ressource
Das Dokument erscheint in:
-
E-Dissertationen (1939)