Draschner, Carsten Felix: Scalable Distributed Machine Learning for Knowledge Graphs. - Bonn, 2023. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-71241
@phdthesis{handle:20.500.11811/10945,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-71241,
author = {{Carsten Felix Draschner}},
title = {Scalable Distributed Machine Learning for Knowledge Graphs},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2023,
month = jul,

note = {Due to the increasing progress of digitization, immense amounts of data are accumulating, which can be summarized under the term Big Data and form an exciting basis for data analyses. Since the data are heterogeneous and come from many different sources, data integration techniques are beneficial to perform analytics. Knowledge Graphs (KG) link the heterogeneous data within a directed multi-graph by unique resource identifiers. These data can be used for data analytics and prediction methods. One subbranch of Artificial Intelligence (AI) is Machine Learning (ML). ML models are developed and trained, which, based on the available training data, should approximate the target data as closely as possible.
The samples in the training data are usually represented by features. For most data analytics and ML approaches, these features are fixed-length numeric feature vectors. However, in the context of KGs, there is no native representation within fixed-length numeric feature vectors.
Depending on the use case, these problems can also require the concrete use and inclusion of individual actual values from the KG.
The sheer size of some large-scale KG data does not fit into the memory of today's computers. One solution is to use cluster computation through distributed execution, which distributes the data and processing tasks across multiple computers. Both the technologies and the algorithms for this distributed computation must be designated. Due to the possible impact of the results from these data analysis pipelines, special technical implementation of accessible, reproducible, reusable, and explainable approaches is beneficial. These ML and AI development meta-dimensions belong to Ethical AI and Sustainable AI concepts.
Within this work, we developed novel approaches for ML on KGs while considering ethical and sustainability dimensions. In particular, we developed technologies that create fixed-length numeric feature vectors. These include methods that, like graph kernels, extract features from the graph in the context of the map-reduce operations relevant for distributed computation. The feature extraction also includes the multi-modal data of KG literals. Accordingly, we have developed methods that enable SPARQL-based feature extraction and assist in creating complex feature-extracting queries. Based on these extracted features, we further contributed scalable, distributed, and explainable ML and data analytics methods such as semantic similarity estimation and classification or regression ML pipelines demonstrating noticeable performance.
We support the transparency, reusability, and reproducibility of our novel open-source approaches by results and meta-data semantification. This semantification transfers the original graph data with the hyper-parameter setup and explainability information, in addition to the predicted results of the ML pipelines, into a semantic native KG. Due to the technological complexity, we enable the application of our algorithm technologies through complementary work such as the use in coding notebooks and the use in Rest API-based environments. Our work also describes the multidimensional and interwoven optimization dimensions of ethical and sustainable KG-based ML. We extended the existing technology stack SANSA, which is used for distributed processing and native semantic data handling, by several scientific publications and software framework releases to offer these functionalities for distributed ML on KGs.},

url = {https://hdl.handle.net/20.500.11811/10945}
}

The following license files are associated with this item:

InCopyright