Hybrid Representation Learning for Information Extraction
Hybrid Representation Learning for Information Extraction

| dc.contributor.advisor | Sifa, Rafet | |
| dc.contributor.author | Deußer, Tobias Kurt Stefan | |
| dc.date.accessioned | 2026-03-25T12:54:10Z | |
| dc.date.available | 2026-03-25T12:54:10Z | |
| dc.date.issued | 25.03.2026 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11811/14010 | |
| dc.description.abstract | In the contemporary digital era, the exponential increase in unstructured and semi-structured data has made information extraction a cornerstone of modern data-driven research and application. The ability to transform such raw information into structured knowledge is crucial for enabling later downstream tasks. While traditional rule-based and statistical approaches to information extraction have demonstrated success in narrow, well-defined tasks, they lack the scalability and adaptability required to address the vastness and variability of present-day data. Conversely, deep neural models and especially large language models have shown remarkable capabilities in language understanding, yet they remain constrained by high computational costs and susceptibility to hallucination. This thesis explores the unification of various symbolic, statistical, and neural paradigms into a cohesive hybrid framework. The central hypothesis is that by combining the strengths of data-driven representation learning with structural, rule-based, and multimodal knowledge, one can achieve information extraction systems that are more accurate, efficient, and reliable than their monolithic counterparts. To test this hypothesis, the thesis investigates a range of hybrid architectures across five key application domains. In the financial domain, a hybrid contradiction detection framework integrates syntactic pre-training with transformer-based representations and clustering algorithms to identify inconsistencies within large-scale financial reports. For named entity recognition, the iNERD algorithm introduces rule-based constraints to guide large language models, producing syntactically valid, hallucination-free entity extractions. Thereafter, the anonymisation study leverages knowledge distillation to compress the language understanding capabilities of large decoder-only models into lightweight encoder-only architectures, enabling secure and efficient text anonymisation. In relation extraction, this work presents KPI-BERT and the open-source KPI-EDGAR dataset, combining contextual embedding models with recurrent layers and noise-based regularisation to extract key performance indicators from financial documents. Extending beyond text, the final empirical contribution introduces a multimodal dementia detection framework that fuses linguistic and acoustic representations, offering a robust approach to early, non-invasive diagnosis. Together, these studies provide compelling evidence that hybrid representation learning constitutes an important paradigm for modern information extraction. This research demonstrates that hybrid systems can achieve higher precision, stronger generalisability, and improved efficiency while remaining adaptable to real-world constraints. The findings of this thesis therefore advance the field towards more trustworthy, sustainable, and application-ready artificial intelligence. | en |
| dc.language.iso | eng | |
| dc.rights | In Copyright | |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
| dc.subject | Machine Learning | |
| dc.subject | Representation Learning | |
| dc.subject | Information Extraction | |
| dc.subject | Natural Language Processing | |
| dc.subject | Contradiction Detection | |
| dc.subject | Named Entity Recognition | |
| dc.subject | Anonymisation | |
| dc.subject | Relation Extraction | |
| dc.subject | Dementia Detection | |
| dc.subject.ddc | 004 Informatik | |
| dc.title | Hybrid Representation Learning for Information Extraction | |
| dc.type | Dissertation oder Habilitation | |
| dc.identifier.doi | https://doi.org/10.48565/bonndoc-823 | |
| dc.publisher.name | Universitäts- und Landesbibliothek Bonn | |
| dc.publisher.location | Bonn | |
| dc.rights.accessRights | openAccess | |
| dc.identifier.urn | https://nbn-resolving.org/urn:nbn:de:hbz:5-87999 | |
| dc.relation.doi | https://doi.org/10.1109/ICPR56361.2022.9956191 | |
| dc.relation.doi | https://doi.org/10.1109/ICMLA55696.2022.00254 | |
| dc.relation.doi | https://doi.org/10.7557/18.6799 | |
| dc.relation.doi | https://doi.org/10.1109/ICMLA58977.2023.00274 | |
| dc.relation.doi | https://doi.org/10.1109/BigData59044.2023.10386673 | |
| dc.relation.doi | https://doi.org/10.1109/BigData62323.2024.10825603 | |
| dc.relation.url | https://aclanthology.org/2025.coling-industry.20/ | |
| ulbbn.pubtype | Erstveröffentlichung | |
| ulbbnediss.affiliation.name | Rheinische Friedrich-Wilhelms-Universität Bonn | |
| ulbbnediss.affiliation.location | Bonn | |
| ulbbnediss.thesis.level | Dissertation | |
| ulbbnediss.dissID | 8799 | |
| ulbbnediss.date.accepted | 06.02.2026 | |
| ulbbnediss.institute | Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik | |
| ulbbnediss.fakultaet | Mathematisch-Naturwissenschaftliche Fakultät | |
| dc.contributor.coReferee | Bauckhage, Christian | |
| ulbbnediss.contributor.orcid | https://orcid.org/0000-0003-4685-0847 |
Dateien zu dieser Ressource
Das Dokument erscheint in:
-
E-Dissertationen (4531)




