Zur Kurzanzeige

Scalable Distributed Machine Learning for Knowledge Graphs

dc.contributor.advisorLehmann, Jens
dc.contributor.authorDraschner, Carsten Felix
dc.date.accessioned2023-07-17T12:20:40Z
dc.date.available2023-07-17T12:20:40Z
dc.date.issued17.07.2023
dc.identifier.urihttps://hdl.handle.net/20.500.11811/10945
dc.description.abstractDue to the increasing progress of digitization, immense amounts of data are accumulating, which can be summarized under the term Big Data and form an exciting basis for data analyses. Since the data are heterogeneous and come from many different sources, data integration techniques are beneficial to perform analytics. Knowledge Graphs (KG) link the heterogeneous data within a directed multi-graph by unique resource identifiers. These data can be used for data analytics and prediction methods. One subbranch of Artificial Intelligence (AI) is Machine Learning (ML). ML models are developed and trained, which, based on the available training data, should approximate the target data as closely as possible.
The samples in the training data are usually represented by features. For most data analytics and ML approaches, these features are fixed-length numeric feature vectors. However, in the context of KGs, there is no native representation within fixed-length numeric feature vectors.
Depending on the use case, these problems can also require the concrete use and inclusion of individual actual values from the KG.
The sheer size of some large-scale KG data does not fit into the memory of today's computers. One solution is to use cluster computation through distributed execution, which distributes the data and processing tasks across multiple computers. Both the technologies and the algorithms for this distributed computation must be designated. Due to the possible impact of the results from these data analysis pipelines, special technical implementation of accessible, reproducible, reusable, and explainable approaches is beneficial. These ML and AI development meta-dimensions belong to Ethical AI and Sustainable AI concepts.
Within this work, we developed novel approaches for ML on KGs while considering ethical and sustainability dimensions. In particular, we developed technologies that create fixed-length numeric feature vectors. These include methods that, like graph kernels, extract features from the graph in the context of the map-reduce operations relevant for distributed computation. The feature extraction also includes the multi-modal data of KG literals. Accordingly, we have developed methods that enable SPARQL-based feature extraction and assist in creating complex feature-extracting queries. Based on these extracted features, we further contributed scalable, distributed, and explainable ML and data analytics methods such as semantic similarity estimation and classification or regression ML pipelines demonstrating noticeable performance.
We support the transparency, reusability, and reproducibility of our novel open-source approaches by results and meta-data semantification. This semantification transfers the original graph data with the hyper-parameter setup and explainability information, in addition to the predicted results of the ML pipelines, into a semantic native KG. Due to the technological complexity, we enable the application of our algorithm technologies through complementary work such as the use in coding notebooks and the use in Rest API-based environments. Our work also describes the multidimensional and interwoven optimization dimensions of ethical and sustainable KG-based ML. We extended the existing technology stack SANSA, which is used for distributed processing and native semantic data handling, by several scientific publications and software framework releases to offer these functionalities for distributed ML on KGs.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectMachine Learning
dc.subjectKnowledge Graphs
dc.subjectDistributed Computing
dc.subjectArtificial Intelligence
dc.subjectAI Ethics
dc.subjectScalable Semantic Analytics
dc.subjectSANSA
dc.subject.ddc004 Informatik
dc.titleScalable Distributed Machine Learning for Knowledge Graphs
dc.typeDissertation oder Habilitation
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-71241
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID7124
ulbbnediss.date.accepted23.06.2023
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeWrobel, Stefan
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0002-1006-146X


Dateien zu dieser Ressource

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden:

InCopyright