Utilizing Constrained Homomorphisms in the Design of Efficient Graph Kernels

Schulz, Till Hendrik

Volltext

Dokument öffnen (1.2MB)

Autor

Schulz, Till Hendrik

Art der Hochschulschrift

Dissertation

Prüfungsdatum

13.12.2023

Datum der Veröffentlichung

16.01.2024

Erstgutachter

Wrobel, Stefan

Zweitgutachter

Bauckhage, Christian

Grad-verleihende Institutionen

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadaten

Zur Langanzeige

Zitierbare Links

Handle: https://hdl.handle.net/20.500.11811/11239
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-73985

Inhalt

Learning on graphs, particularly graph classification, requires rich graph representations. A common paradigm to obtain these is by extracting sets of substructures and representing graphs by such sets. The obtained graph representations then enable the application of standard machine learning ap- proaches like support vector machines. Traditionally, graph substructures refer to subgraph patterns which are embedded by subgraph isomorphisms. Identifying subgraph patterns is however compu- tationally infeasible due to the NP-completeness of deciding subgraph isomorphism even when the patterns are restricted to trees. A relaxation of the problem is to consider graph homomorphisms as the pattern matching operator instead, which can in fact be computed in polynomial time for tree patterns. However, graph homomorphisms generally result in less suitable graph representations for classification tasks. A key observation, which has been largely disregarded in the machine learning community, is that subgraph isomorphisms can be regarded as constrained homomorphisms. In this dissertation, we utilize this unifying view of these two pattern embedding operators by considering tractable instances of constrained homomorphisms on tree patterns and design three powerful and efficiently computable graph kernels.
To bridge the gap between graph homomorphisms and subgraph isomorphisms, we first introduce the notion of partially injective homomorphisms which require injectivity only for subsets of the pat- terns’ vertex pairs. Utilizing positive complexity results on deciding homomorphisms from bounded treewidth graphs, we present an algorithm mining frequent trees w.r.t. partially injective homomor- phisms in incremental polynomial time. We design a kernel function which measures graph similarity in terms of such mutually occurring patterns and experimentally demonstrate that by bridging the gap between graph homomorphism and subgraph isomorphism, our approach offers an attractive trade-off between efficiency and predictive power.
Subsequently, we turn our attention to the popular Weisfeiler-Lehman method. This label prop- agation algorithm implicitly constructs tree patterns for which the embedding operator is given by locally bijective homomorphisms, another kind of constrained homomorphisms. While such patterns can be very efficiently computed and yield expressive graph representations, comparing graphs in terms of mutually occurring Weisfeiler-Lehman patterns is an often insufficient similarity measure. We propose two approaches to overcome this drawback.
Utilizing the concept of graph filtrations, we introduce a graph kernel which compares distributions of Weisfeiler-Lehman patterns over multiple graph resolutions. This approach offers a fine-grained graph similarity by comparing existence intervals of patterns, instead of their cardinalities. We show that this kernel is powerful in terms of distinguishing non-isomorphic graphs and even gives rise to complete graph kernels in certain scenarios. Moreover, the kernel can be generalized to arbitrary graph features, enabling an application beyond Weisfeiler-Lehman patterns. We empirically validate our theoretical findings on the expressive power of our kernel and provide experiments on real-world benchmark datasets which show a favorable performance of our approach compared to state-of-the-art graph kernels.
Finally, we propose a graph kernel, which compares graphs using a fine-grained similarity measure on Weisfeiler-Lehman patterns, effectively replacing the traditionally considered similarity defined by equality. This is achieved by a specifically designed tree edit distance which provides a semantically adequate and efficiently computable comparison on Weisfeiler-Lehman tree patterns. The key idea is to cluster similar patterns w.r.t. this distance measure and define a graph kernel that treats two patterns as equivalent if they belong to the same cluster. In an experimental section, we systematically investigate this kernel’s predictive performance and show that it significantly outperforms state-of- the-art graph kernels on several graph benchmark datasets beyond the typically considered molecular graphs.

Klassifikation (DDC)

004 Informatik

Zugehörige Publikation(en)

https://doi.org/10.1007/978-3-030-10928-8_35
https://doi.org/10.1007/s10994-022-06131-w
https://doi.org/10.1609/aaai.v36i8.20793

Zitiervorschlag
BibTeX

Schulz, Till Hendrik: Utilizing Constrained Homomorphisms in the Design of Efficient Graph Kernels. - Bonn, 2024. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-73985

@phdthesis{handle:20.500.11811/11239,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-73985,
author = {{Till Hendrik Schulz}},
title = {Utilizing Constrained Homomorphisms in the Design of Efficient Graph Kernels},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2024,
month = jan,
note = {Learning on graphs, particularly graph classification, requires rich graph representations. A common paradigm to obtain these is by extracting sets of substructures and representing graphs by such sets. The obtained graph representations then enable the application of standard machine learning ap- proaches like support vector machines. Traditionally, graph substructures refer to subgraph patterns which are embedded by subgraph isomorphisms. Identifying subgraph patterns is however compu- tationally infeasible due to the NP-completeness of deciding subgraph isomorphism even when the patterns are restricted to trees. A relaxation of the problem is to consider graph homomorphisms as the pattern matching operator instead, which can in fact be computed in polynomial time for tree patterns. However, graph homomorphisms generally result in less suitable graph representations for classification tasks. A key observation, which has been largely disregarded in the machine learning community, is that subgraph isomorphisms can be regarded as constrained homomorphisms. In this dissertation, we utilize this unifying view of these two pattern embedding operators by considering tractable instances of constrained homomorphisms on tree patterns and design three powerful and efficiently computable graph kernels.
To bridge the gap between graph homomorphisms and subgraph isomorphisms, we first introduce the notion of partially injective homomorphisms which require injectivity only for subsets of the pat- terns’ vertex pairs. Utilizing positive complexity results on deciding homomorphisms from bounded treewidth graphs, we present an algorithm mining frequent trees w.r.t. partially injective homomor- phisms in incremental polynomial time. We design a kernel function which measures graph similarity in terms of such mutually occurring patterns and experimentally demonstrate that by bridging the gap between graph homomorphism and subgraph isomorphism, our approach offers an attractive trade-off between efficiency and predictive power.
Subsequently, we turn our attention to the popular Weisfeiler-Lehman method. This label prop- agation algorithm implicitly constructs tree patterns for which the embedding operator is given by locally bijective homomorphisms, another kind of constrained homomorphisms. While such patterns can be very efficiently computed and yield expressive graph representations, comparing graphs in terms of mutually occurring Weisfeiler-Lehman patterns is an often insufficient similarity measure. We propose two approaches to overcome this drawback.
Utilizing the concept of graph filtrations, we introduce a graph kernel which compares distributions of Weisfeiler-Lehman patterns over multiple graph resolutions. This approach offers a fine-grained graph similarity by comparing existence intervals of patterns, instead of their cardinalities. We show that this kernel is powerful in terms of distinguishing non-isomorphic graphs and even gives rise to complete graph kernels in certain scenarios. Moreover, the kernel can be generalized to arbitrary graph features, enabling an application beyond Weisfeiler-Lehman patterns. We empirically validate our theoretical findings on the expressive power of our kernel and provide experiments on real-world benchmark datasets which show a favorable performance of our approach compared to state-of-the-art graph kernels.
Finally, we propose a graph kernel, which compares graphs using a fine-grained similarity measure on Weisfeiler-Lehman patterns, effectively replacing the traditionally considered similarity defined by equality. This is achieved by a specifically designed tree edit distance which provides a semantically adequate and efficiently computable comparison on Weisfeiler-Lehman tree patterns. The key idea is to cluster similar patterns w.r.t. this distance measure and define a graph kernel that treats two patterns as equivalent if they belong to the same cluster. In an experimental section, we systematically investigate this kernel’s predictive performance and show that it significantly outperforms state-of- the-art graph kernels on several graph benchmark datasets beyond the typically considered molecular graphs.},
url = {https://hdl.handle.net/20.500.11811/11239}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: