Zur Kurzanzeige

Practical Models for Sequential Decision Making in Natural Language Processing and Reinforcement Learning

dc.contributor.advisorBauckhage, Christian
dc.contributor.authorRamamurthy, Rajkumar
dc.date.accessioned2023-11-17T13:14:56Z
dc.date.available2023-11-17T13:14:56Z
dc.date.issued17.11.2023
dc.identifier.urihttps://hdl.handle.net/20.500.11811/11139
dc.description.abstractThis thesis focuses on sequential decision and prediction (SDP) tasks, comprising structured prediction (SP) and reinforcement learning (RL) tasks. These tasks are characterized by generation of sequential outputs that exhibit interdependencies among them. Notable examples include machine translation (MT), where the objective is to map variable-length input sequences to variable-length output sequences and robotic RL tasks like object grasping, where a sequence of actions must be generated to accomplish the task, with each action influencing future actions.
Contextualized representations play a vital role in making decisions at each step. Modern architectures like Recurrent Neural Networks (RNNs) have become standard tools for obtaining contextualized inputs and achieving state-of-the-art results in SDP tasks. However, RNNs demand high-compute GPU resources and are impractical for low-resource settings. Consequently, the first part of the thesis explores the application of random context encoders, specifically to investigate the performance gap compared to fully trained RNNs. It begins by showcasing the capabilities of echo state networks (ESNs), a type of RNNs, to memorize and reproduce arbitrary sequences of data, such as text, images, and videos, without requiring full training. Next, ESNs are applied to Named Entity Recognition (NER) and learning control policies using RL, highlighting their effectiveness as randomized contextualized encoders.
The thesis then shifts its focus to address the challenges of exploration and high sample complexity in RL. Specifically, two approaches that incorporate additional knowledge into the learning process are introduced. The first approach considers Novelty Search (NS), a method designed to enhance exploration and sample complexity, is considered, and a novel approach utilizing auto-encoders to learn sparse representations of agent behaviors is proposed. This approach outperforms traditional NS methods and provides a solution to promote exploration in RL tasks. Furthermore, the thesis introduces a method to construct policy networks by leveraging domain knowledge to improve transparency, modularity, and data efficiency. This decomposition of policy networks into adaptable and hand-designed components significantly reduces the number of interactions required for learning when compared to fully trained end-to-end recurrent networks.
In the final part of the thesis, we closely look at the relation between SP and RL tasks. In particular, we consider the formulation of SP as an RL task to overcome the problems of data and metric mismatch associated with training SP using supervised learning. Despite the successful application of RL in these settings, research in this direction is hindered by a lack of open-source toolkits. To mitigate this, we first propose NLPGym, a modular toolkit that casts typical NLP tasks as RL tasks, allowing RL algorithms to directly optimize any application-specific metric. Building upon this and utilizing large-language models (LLMs), we present an open-source library RL4LMs that can fine-tune LLMs on arbitrary reward functions, including learned reward models based on automated metrics and human preferences. Additionally, a comprehensive benchmark to evaluate LLMs on various text generation tasks and a new algorithm has been introduced to fine-tune LLMs efficiently.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subject.ddc004 Informatik
dc.titlePractical Models for Sequential Decision Making in Natural Language Processing and Reinforcement Learning
dc.typeDissertation oder Habilitation
dc.identifier.doihttps://doi.org/10.48565/bonndoc-161
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-73178
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID7317
ulbbnediss.date.accepted09.11.2023
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeWrobel, Stefan
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0003-4440-7032


Dateien zu dieser Ressource

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden:

InCopyright