Hybrid Representation Learning for Information Extraction

Deußer, Tobias Kurt Stefan

dc.contributor.advisor	Sifa, Rafet
dc.contributor.author	Deußer, Tobias Kurt Stefan
dc.date.accessioned	2026-03-25T12:54:10Z
dc.date.available	2026-03-25T12:54:10Z
dc.date.issued	25.03.2026
dc.identifier.uri	https://hdl.handle.net/20.500.11811/14010
dc.description.abstract	In the contemporary digital era, the exponential increase in unstructured and semi-structured data has made information extraction a cornerstone of modern data-driven research and application. The ability to transform such raw information into structured knowledge is crucial for enabling later downstream tasks. While traditional rule-based and statistical approaches to information extraction have demonstrated success in narrow, well-defined tasks, they lack the scalability and adaptability required to address the vastness and variability of present-day data. Conversely, deep neural models and especially large language models have shown remarkable capabilities in language understanding, yet they remain constrained by high computational costs and susceptibility to hallucination. This thesis explores the unification of various symbolic, statistical, and neural paradigms into a cohesive hybrid framework. The central hypothesis is that by combining the strengths of data-driven representation learning with structural, rule-based, and multimodal knowledge, one can achieve information extraction systems that are more accurate, efficient, and reliable than their monolithic counterparts. To test this hypothesis, the thesis investigates a range of hybrid architectures across five key application domains. In the financial domain, a hybrid contradiction detection framework integrates syntactic pre-training with transformer-based representations and clustering algorithms to identify inconsistencies within large-scale financial reports. For named entity recognition, the iNERD algorithm introduces rule-based constraints to guide large language models, producing syntactically valid, hallucination-free entity extractions. Thereafter, the anonymisation study leverages knowledge distillation to compress the language understanding capabilities of large decoder-only models into lightweight encoder-only architectures, enabling secure and efficient text anonymisation. In relation extraction, this work presents KPI-BERT and the open-source KPI-EDGAR dataset, combining contextual embedding models with recurrent layers and noise-based regularisation to extract key performance indicators from financial documents. Extending beyond text, the final empirical contribution introduces a multimodal dementia detection framework that fuses linguistic and acoustic representations, offering a robust approach to early, non-invasive diagnosis. Together, these studies provide compelling evidence that hybrid representation learning constitutes an important paradigm for modern information extraction. This research demonstrates that hybrid systems can achieve higher precision, stronger generalisability, and improved efficiency while remaining adaptable to real-world constraints. The findings of this thesis therefore advance the field towards more trustworthy, sustainable, and application-ready artificial intelligence.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Machine Learning
dc.subject	Representation Learning
dc.subject	Information Extraction
dc.subject	Natural Language Processing
dc.subject	Contradiction Detection
dc.subject	Named Entity Recognition
dc.subject	Anonymisation
dc.subject	Relation Extraction
dc.subject	Dementia Detection
dc.subject.ddc	004 Informatik
dc.title	Hybrid Representation Learning for Information Extraction
dc.type	Dissertation oder Habilitation
dc.identifier.doi	https://doi.org/10.48565/bonndoc-823
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-87999
dc.relation.doi	https://doi.org/10.1109/ICPR56361.2022.9956191
dc.relation.doi	https://doi.org/10.1109/ICMLA55696.2022.00254
dc.relation.doi	https://doi.org/10.7557/18.6799
dc.relation.doi	https://doi.org/10.1109/ICMLA58977.2023.00274
dc.relation.doi	https://doi.org/10.1109/BigData59044.2023.10386673
dc.relation.doi	https://doi.org/10.1109/BigData62323.2024.10825603
dc.relation.url	https://aclanthology.org/2025.coling-industry.20/
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	8799
ulbbnediss.date.accepted	06.02.2026
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Bauckhage, Christian
ulbbnediss.contributor.orcid	https://orcid.org/0000-0003-4685-0847

Dateien zu dieser Ressource

Name:: 8799.pdf
Größe:: 2.3MB
Format:: PDF

Dokument öffnen

Das Dokument erscheint in:

E-Dissertationen (4581)

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: