Advances in Machine Learning Approaches for Biostatistical Learning

Welchowski, Thomas

Volltext

View/Open (38.9MB)

Author

Welchowski, Thomas

ORCID

https://orcid.org/0000-0003-2940-647X

Type of Scholarly Publication

Habilitation

Date of Exam

30.01.2025

Date of Publication

27.06.2025

Advisor

Schmid, Matthias

Co-Referee

Rügamer, David

Degree Granting Institutions

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadata

Show full item record

Citable Links

Handle: https://hdl.handle.net/20.500.11811/13164
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-83173
DOI: https://doi.org/10.48565/bonndoc-586

Abstract

This habilitation thesis summarized current state-of-art advances in machine learning for biomedical applications. The first contribution was the development of a framework for tuning KDSN to increase prediction performance (Welchowski and Schmid, 2016). KDSN are a computational efficient alternative to backpropagation-based artificial neural network techniques with comparable prediction performance on biomedical tabular data that allow layer-wise closed form solutions. The proposed model-based tuning framework is much shorter in terms of computation time than grid-based search strategies. This work was extended to SKDSN that includes variable selection, dropout and regularization to make KDSN more flexible (Welchowski and Schmid, 2019). SKDSN modifications improved upon the performance of KDSN, but could not match the performance of ensemble methods applied to biomedical tabular data sets, especially when the number of covariates was high. IML methods provide tools to gain further insights from those black-box models. A case study in ecology highlighted strength and weaknesses of IML methods that quantify magnitude of effects and their interactions (Welchowski et al., 2022). In particular, graphical tools showed their limits to investigate higher order interaction effects. Previous approaches for inference of model-agnostic interaction effects were limited to few comparisons of covariates sets due to computational runtime intensive resampling and prediction model refitting. The follow-up article Welchowski and Edelmann (2024) then developed a model-agnostic interaction hypothesis test to detect interaction effects to address these shortcomings. Simulations showed control of type I error and reasonable power levels were achieved with approximately few hundred observations. Furthermore due to the derived asymptotic distribution the test is far more computational runtime efficient than previous approaches and can be flexibly specified to covariate sets of interest.

Diese Habilitationsschrift fasste den aktuellen Stand des maschinellen Lernens für biomedizinische Anwendungen zusammen. Der erste Beitrag war die Entwicklung eines Frameworks zur Optimierung von KDSN zur Verbesserung der Vorhersageleistung (Welchowski und Schmid, 2016). KDSN stellen eine rechnerisch effiziente Alternative zu Backpropagation-basierten künstlichen neuronalen Netzen dar und bieten vergleichbare Vorhersageleistung für biomedizinische Tabellendaten, die schichtweise geschlossene Lösungen ermöglichen. Das vorgeschlagene modellbasierte Optimierungsframework ist deutlich rechenzeitsparender als gitterbasierte Suchstrategien. Diese Arbeit wurde auf SKDSN erweitert, das Variablenauswahl, Dropout und Regularisierung umfasst, um KDSN flexibler zu gestalten (Welchowski und Schmid, 2019). SKDSN-Modifikationen verbesserten die Leistung von KDSN, erreichten jedoch nicht die Leistung von Ensemble-Methoden für biomedizinische Tabellendatensätze, insbesondere bei hoher Anzahl an Kovariablen. IML-Methoden bieten Werkzeuge, um weitere Erkenntnisse aus diesen Black-Box-Modellen zu gewinnen. Eine Fallstudie aus der Ökologie verdeutlichte die Stärken und Schwächen von IML-Methoden zur Quantifizierung des Ausmaßes von Effekten und ihrer Wechselwirkungen (Welchowski et al., 2022). Insbesondere grafische Werkzeuge zeigten ihre Grenzen bei der Untersuchung von Interaktionseffekten höherer Ordnung. Frühere Ansätze zur Inferenz modellagnostischer Interaktionseffekte beschränkten sich aufgrund rechenintensiver Resampling- und Modellanpassungen auf wenige Vergleiche von Kovariablen. Der Folgeartikel von Welchowski und Edelmann (2024) entwickelte daraufhin einen modellagnostischen Interaktionshypothesentest zur Erkennung von Interaktionseffekten, um diese Defizite zu beheben. Simulationen zeigten eine Kontrolle des Fehlers erster Art und ein angemessenes Trennschärfeniveau mit etwa einigen hundert Beobachtungen. Darüber hinaus ist der Test aufgrund der abgeleiteten asymptotischen Verteilung weitaus rechenzeiteffizienter als frühere Ansätze und kann flexibel an die jeweiligen Kovariablen angepasst werden.

Classification (DDC)

310 Allgemeine Statistiken

570 Biowissenschaften, Biologie

610 Medizin, Gesundheit

Related Publications

https://doi.org/10.1016/j.artmed.2016.04.002
https://doi.org/10.1007/s00180-018-0832-9
https://doi.org/10.1007/s13253-021-00479-7
https://doi.org/10.3390/make6020061

Zitiervorschlag
BibTeX

Welchowski, Thomas: Advances in Machine Learning Approaches for Biostatistical Learning. - Bonn, 2025. - Habilitation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-83173

@phdthesis{handle:20.500.11811/13164,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-83173,
doi: https://doi.org/10.48565/bonndoc-586,
author = {{Thomas Welchowski}},
title = {Advances in Machine Learning Approaches for Biostatistical Learning},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = jun,
note = {This habilitation thesis summarized current state-of-art advances in machine learning for biomedical applications. The first contribution was the development of a framework for tuning KDSN to increase prediction performance (Welchowski and Schmid, 2016). KDSN are a computational efficient alternative to backpropagation-based artificial neural network techniques with comparable prediction performance on biomedical tabular data that allow layer-wise closed form solutions. The proposed model-based tuning framework is much shorter in terms of computation time than grid-based search strategies. This work was extended to SKDSN that includes variable selection, dropout and regularization to make KDSN more flexible (Welchowski and Schmid, 2019). SKDSN modifications improved upon the performance of KDSN, but could not match the performance of ensemble methods applied to biomedical tabular data sets, especially when the number of covariates was high. IML methods provide tools to gain further insights from those black-box models. A case study in ecology highlighted strength and weaknesses of IML methods that quantify magnitude of effects and their interactions (Welchowski et al., 2022). In particular, graphical tools showed their limits to investigate higher order interaction effects. Previous approaches for inference of model-agnostic interaction effects were limited to few comparisons of covariates sets due to computational runtime intensive resampling and prediction model refitting. The follow-up article Welchowski and Edelmann (2024) then developed a model-agnostic interaction hypothesis test to detect interaction effects to address these shortcomings. Simulations showed control of type I error and reasonable power levels were achieved with approximately few hundred observations. Furthermore due to the derived asymptotic distribution the test is far more computational runtime efficient than previous approaches and can be flexibly specified to covariate sets of interest.},
url = {https://hdl.handle.net/20.500.11811/13164}
}

The following license files are associated with this item: