Scalable Distributed Machine Learning for Knowledge Graphs

Draschner, Carsten Felix

dc.contributor.advisor	Lehmann, Jens
dc.contributor.author	Draschner, Carsten Felix
dc.date.accessioned	2023-07-17T12:20:40Z
dc.date.available	2023-07-17T12:20:40Z
dc.date.issued	17.07.2023
dc.identifier.uri	https://hdl.handle.net/20.500.11811/10945
dc.description.abstract	Due to the increasing progress of digitization, immense amounts of data are accumulating, which can be summarized under the term Big Data and form an exciting basis for data analyses. Since the data are heterogeneous and come from many different sources, data integration techniques are beneficial to perform analytics. Knowledge Graphs (KG) link the heterogeneous data within a directed multi-graph by unique resource identifiers. These data can be used for data analytics and prediction methods. One subbranch of Artificial Intelligence (AI) is Machine Learning (ML). ML models are developed and trained, which, based on the available training data, should approximate the target data as closely as possible. The samples in the training data are usually represented by features. For most data analytics and ML approaches, these features are fixed-length numeric feature vectors. However, in the context of KGs, there is no native representation within fixed-length numeric feature vectors. Depending on the use case, these problems can also require the concrete use and inclusion of individual actual values from the KG. The sheer size of some large-scale KG data does not fit into the memory of today's computers. One solution is to use cluster computation through distributed execution, which distributes the data and processing tasks across multiple computers. Both the technologies and the algorithms for this distributed computation must be designated. Due to the possible impact of the results from these data analysis pipelines, special technical implementation of accessible, reproducible, reusable, and explainable approaches is beneficial. These ML and AI development meta-dimensions belong to Ethical AI and Sustainable AI concepts. Within this work, we developed novel approaches for ML on KGs while considering ethical and sustainability dimensions. In particular, we developed technologies that create fixed-length numeric feature vectors. These include methods that, like graph kernels, extract features from the graph in the context of the map-reduce operations relevant for distributed computation. The feature extraction also includes the multi-modal data of KG literals. Accordingly, we have developed methods that enable SPARQL-based feature extraction and assist in creating complex feature-extracting queries. Based on these extracted features, we further contributed scalable, distributed, and explainable ML and data analytics methods such as semantic similarity estimation and classification or regression ML pipelines demonstrating noticeable performance. We support the transparency, reusability, and reproducibility of our novel open-source approaches by results and meta-data semantification. This semantification transfers the original graph data with the hyper-parameter setup and explainability information, in addition to the predicted results of the ML pipelines, into a semantic native KG. Due to the technological complexity, we enable the application of our algorithm technologies through complementary work such as the use in coding notebooks and the use in Rest API-based environments. Our work also describes the multidimensional and interwoven optimization dimensions of ethical and sustainable KG-based ML. We extended the existing technology stack SANSA, which is used for distributed processing and native semantic data handling, by several scientific publications and software framework releases to offer these functionalities for distributed ML on KGs.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Machine Learning
dc.subject	Knowledge Graphs
dc.subject	Distributed Computing
dc.subject	Artificial Intelligence
dc.subject	AI Ethics
dc.subject	Scalable Semantic Analytics
dc.subject	SANSA
dc.subject.ddc	004 Informatik
dc.title	Scalable Distributed Machine Learning for Knowledge Graphs
dc.type	Dissertation oder Habilitation
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-71241
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	7124
ulbbnediss.date.accepted	23.06.2023
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Wrobel, Stefan
ulbbnediss.contributor.orcid	https://orcid.org/0000-0002-1006-146X

Dateien zu dieser Ressource

Name:: 7124.pdf
Größe:: 22.8MB
Format:: PDF

Dokument öffnen

Das Dokument erscheint in:

E-Dissertationen (4137)

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: