Practical Models for Sequential Decision Making in Natural Language Processing and Reinforcement Learning

Ramamurthy, Rajkumar

dc.contributor.advisor	Bauckhage, Christian
dc.contributor.author	Ramamurthy, Rajkumar
dc.date.accessioned	2023-11-17T13:14:56Z
dc.date.available	2023-11-17T13:14:56Z
dc.date.issued	17.11.2023
dc.identifier.uri	https://hdl.handle.net/20.500.11811/11139
dc.description.abstract	This thesis focuses on sequential decision and prediction (SDP) tasks, comprising structured prediction (SP) and reinforcement learning (RL) tasks. These tasks are characterized by generation of sequential outputs that exhibit interdependencies among them. Notable examples include machine translation (MT), where the objective is to map variable-length input sequences to variable-length output sequences and robotic RL tasks like object grasping, where a sequence of actions must be generated to accomplish the task, with each action influencing future actions. Contextualized representations play a vital role in making decisions at each step. Modern architectures like Recurrent Neural Networks (RNNs) have become standard tools for obtaining contextualized inputs and achieving state-of-the-art results in SDP tasks. However, RNNs demand high-compute GPU resources and are impractical for low-resource settings. Consequently, the first part of the thesis explores the application of random context encoders, specifically to investigate the performance gap compared to fully trained RNNs. It begins by showcasing the capabilities of echo state networks (ESNs), a type of RNNs, to memorize and reproduce arbitrary sequences of data, such as text, images, and videos, without requiring full training. Next, ESNs are applied to Named Entity Recognition (NER) and learning control policies using RL, highlighting their effectiveness as randomized contextualized encoders. The thesis then shifts its focus to address the challenges of exploration and high sample complexity in RL. Specifically, two approaches that incorporate additional knowledge into the learning process are introduced. The first approach considers Novelty Search (NS), a method designed to enhance exploration and sample complexity, is considered, and a novel approach utilizing auto-encoders to learn sparse representations of agent behaviors is proposed. This approach outperforms traditional NS methods and provides a solution to promote exploration in RL tasks. Furthermore, the thesis introduces a method to construct policy networks by leveraging domain knowledge to improve transparency, modularity, and data efficiency. This decomposition of policy networks into adaptable and hand-designed components significantly reduces the number of interactions required for learning when compared to fully trained end-to-end recurrent networks. In the final part of the thesis, we closely look at the relation between SP and RL tasks. In particular, we consider the formulation of SP as an RL task to overcome the problems of data and metric mismatch associated with training SP using supervised learning. Despite the successful application of RL in these settings, research in this direction is hindered by a lack of open-source toolkits. To mitigate this, we first propose NLPGym, a modular toolkit that casts typical NLP tasks as RL tasks, allowing RL algorithms to directly optimize any application-specific metric. Building upon this and utilizing large-language models (LLMs), we present an open-source library RL4LMs that can fine-tune LLMs on arbitrary reward functions, including learned reward models based on automated metrics and human preferences. Additionally, a comprehensive benchmark to evaluate LLMs on various text generation tasks and a new algorithm has been introduced to fine-tune LLMs efficiently.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject.ddc	004 Informatik
dc.title	Practical Models for Sequential Decision Making in Natural Language Processing and Reinforcement Learning
dc.type	Dissertation oder Habilitation
dc.identifier.doi	https://doi.org/10.48565/bonndoc-161
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-73178
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	7317
ulbbnediss.date.accepted	09.11.2023
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Wrobel, Stefan
ulbbnediss.contributor.orcid	https://orcid.org/0000-0003-4440-7032

Files in this item

Name:: 7317.pdf
Size:: 7.2MB
Format:: PDF

View/Open

This item appears in the following Collection(s)

E-Dissertationen (4605)

Show simple item record

The following license files are associated with this item: