Show simple item record

Deep Representation Learning for Financial Document Analytics

dc.contributor.advisorBauckhage, Christian
dc.contributor.authorHillebrand, Lars Patrick
dc.date.accessioned2025-07-30T08:15:21Z
dc.date.available2025-07-30T08:15:21Z
dc.date.issued30.07.2025
dc.identifier.urihttps://hdl.handle.net/20.500.11811/13286
dc.description.abstractIn this thesis, we leverage machine learning (ML) methods primarily based on deep neural networks to drastically reduce the manual work of financial analysts, investors, auditors, and other stakeholders by automating key steps in their analysis of financial disclosure documents. A core challenge in this context is transforming highly unstructured and inherently discrete textual data into meaningful numerical representations that ML models can effectively interpret. We refer to this automated conversion process as representation learning and remark that the learned representations must encode the text's syntactic, semantic, and contextual structure.
We develop novel methodologies utilizing deep representation learning to improve the efficiency and quality of several financial document analysis tasks. First, we introduce an approach for the joint extraction, linking, and consistency checking of Key Performance Indicators (KPIs) from corporate disclosure reports. By fine-tuning a bidirectional text encoder neural network with classification heads for named entity recognition and relation extraction, we efficiently extract KPIs and predict their interrelationships. Building upon this, we enhance the detection of numerical inconsistencies between semantically equivalent KPIs using contrastive learning techniques. This includes joint sentence and table encoding and a contrastive autoencoder classification module, along with a filtering mechanism employing cross-attention to handle data imbalance from numerous unrelated KPI pairs.
To assist auditors in aligning regulatory requirements with relevant sections of financial reports, we introduce a context-aware recommender system designed to retrieve the most pertinent text passages in sustainability reports. The system utilizes a Transformer-based encoding module with a non-linear multi-label classification head, trained end-to-end. Recognizing the limitations of processing paragraphs in isolation, we propose a novel pre-training methodology called Pointer-Guided Segment Ordering, which enhances the model's ability to generate contextually rich paragraph embeddings by understanding narrative flow and inter-paragraph relationships.
Addressing the dynamic nature of accounting standards, we propose a flexible compliance check methodology using Large Language Models (LLMs). We combine a fine-tuned semantic text matching model with an LLM-based re-ranking module, enabling zero-shot matching between financial reports and potentially unseen legal requirements. We further integrate a compliance verification component that employs zero- and few-shot learning with prompting techniques like chain-of-thought to assess compliance with disclosure requirements from international accounting standards.
Lastly, we develop a specialized LLM-powered chatbot with an optimized Retrieval-Augmented Generation pipeline to support compliance with Risk Management and Quality (R&Q) standards. By integrating hybrid search techniques and relevance boosting, the system enhances retrieval accuracy and provides precise and contextually appropriate answers to queries related to R&Q standards, aiding employees in accessing and interpreting complex regulatory information.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectMachine Learning
dc.subjectRepresentation Learning
dc.subjectDeep Learning
dc.subjectNatural Language Processing
dc.subjectLarge Language Models
dc.subjectFinance
dc.subject.ddc004 Informatik
dc.titleDeep Representation Learning for Financial Document Analytics
dc.typeDissertation oder Habilitation
dc.identifier.doihttps://doi.org/10.48565/bonndoc-622
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-84061
dc.relation.doihttps://doi.org/10.1007/978-3-030-57321-8_22
dc.relation.doihttps://doi.org/10.3390/make3010007
dc.relation.doihttps://doi.org/10.1109/ICPR56361.2022.9956191
dc.relation.doihttps://doi.org/10.1109/BigData55660.2022.10020308
dc.relation.doihttps://doi.org/10.1145/3594536.3595131
dc.relation.doihttps://doi.org/10.1007/978-3-031-70359-1_23
dc.relation.doihttps://doi.org/10.1145/3573128.3609344
dc.relation.doihttps://doi.org/10.1109/BigData59044.2023.10386518
dc.relation.doihttps://doi.org/10.1109/BigData62323.2024.10825431
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID8406
ulbbnediss.date.accepted16.07.2025
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeSifa, Rafet
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0002-5496-4177


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

The following license files are associated with this item:

InCopyright