Deep Representation Learning for Financial Document Analytics

Hillebrand, Lars Patrick

dc.contributor.advisor	Bauckhage, Christian
dc.contributor.author	Hillebrand, Lars Patrick
dc.date.accessioned	2025-07-30T08:15:21Z
dc.date.available	2025-07-30T08:15:21Z
dc.date.issued	30.07.2025
dc.identifier.uri	https://hdl.handle.net/20.500.11811/13286
dc.description.abstract	In this thesis, we leverage machine learning (ML) methods primarily based on deep neural networks to drastically reduce the manual work of financial analysts, investors, auditors, and other stakeholders by automating key steps in their analysis of financial disclosure documents. A core challenge in this context is transforming highly unstructured and inherently discrete textual data into meaningful numerical representations that ML models can effectively interpret. We refer to this automated conversion process as representation learning and remark that the learned representations must encode the text's syntactic, semantic, and contextual structure. We develop novel methodologies utilizing deep representation learning to improve the efficiency and quality of several financial document analysis tasks. First, we introduce an approach for the joint extraction, linking, and consistency checking of Key Performance Indicators (KPIs) from corporate disclosure reports. By fine-tuning a bidirectional text encoder neural network with classification heads for named entity recognition and relation extraction, we efficiently extract KPIs and predict their interrelationships. Building upon this, we enhance the detection of numerical inconsistencies between semantically equivalent KPIs using contrastive learning techniques. This includes joint sentence and table encoding and a contrastive autoencoder classification module, along with a filtering mechanism employing cross-attention to handle data imbalance from numerous unrelated KPI pairs. To assist auditors in aligning regulatory requirements with relevant sections of financial reports, we introduce a context-aware recommender system designed to retrieve the most pertinent text passages in sustainability reports. The system utilizes a Transformer-based encoding module with a non-linear multi-label classification head, trained end-to-end. Recognizing the limitations of processing paragraphs in isolation, we propose a novel pre-training methodology called Pointer-Guided Segment Ordering, which enhances the model's ability to generate contextually rich paragraph embeddings by understanding narrative flow and inter-paragraph relationships. Addressing the dynamic nature of accounting standards, we propose a flexible compliance check methodology using Large Language Models (LLMs). We combine a fine-tuned semantic text matching model with an LLM-based re-ranking module, enabling zero-shot matching between financial reports and potentially unseen legal requirements. We further integrate a compliance verification component that employs zero- and few-shot learning with prompting techniques like chain-of-thought to assess compliance with disclosure requirements from international accounting standards. Lastly, we develop a specialized LLM-powered chatbot with an optimized Retrieval-Augmented Generation pipeline to support compliance with Risk Management and Quality (R&Q) standards. By integrating hybrid search techniques and relevance boosting, the system enhances retrieval accuracy and provides precise and contextually appropriate answers to queries related to R&Q standards, aiding employees in accessing and interpreting complex regulatory information.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Machine Learning
dc.subject	Representation Learning
dc.subject	Deep Learning
dc.subject	Natural Language Processing
dc.subject	Large Language Models
dc.subject	Finance
dc.subject.ddc	004 Informatik
dc.title	Deep Representation Learning for Financial Document Analytics
dc.type	Dissertation oder Habilitation
dc.identifier.doi	https://doi.org/10.48565/bonndoc-622
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-84061
dc.relation.doi	https://doi.org/10.1007/978-3-030-57321-8_22
dc.relation.doi	https://doi.org/10.3390/make3010007
dc.relation.doi	https://doi.org/10.1109/ICPR56361.2022.9956191
dc.relation.doi	https://doi.org/10.1109/BigData55660.2022.10020308
dc.relation.doi	https://doi.org/10.1145/3594536.3595131
dc.relation.doi	https://doi.org/10.1007/978-3-031-70359-1_23
dc.relation.doi	https://doi.org/10.1145/3573128.3609344
dc.relation.doi	https://doi.org/10.1109/BigData59044.2023.10386518
dc.relation.doi	https://doi.org/10.1109/BigData62323.2024.10825431
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	8406
ulbbnediss.date.accepted	16.07.2025
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Sifa, Rafet
ulbbnediss.contributor.orcid	https://orcid.org/0000-0002-5496-4177

Dateien zu dieser Ressource

Name:: 8406.pdf
Größe:: 7.1MB
Format:: PDF

Dokument öffnen

Das Dokument erscheint in:

E-Dissertationen (4581)

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: