Practical Models for Sequential Decision Making in Natural Language Processing and Reinforcement Learning

Ramamurthy, Rajkumar

Volltext

Dokument öffnen (7.2MB)

Autor

Ramamurthy, Rajkumar

ORCID

https://orcid.org/0000-0003-4440-7032

Art der Hochschulschrift

Dissertation

Prüfungsdatum

09.11.2023

Datum der Veröffentlichung

17.11.2023

Erstgutachter

Bauckhage, Christian

Zweitgutachter

Wrobel, Stefan

Beteiligte Institutionen

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadaten

Zur Langanzeige

Zitierbare Links

Handle: https://hdl.handle.net/20.500.11811/11139
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-73178
DOI: https://doi.org/10.48565/bonndoc-161

Inhalt

This thesis focuses on sequential decision and prediction (SDP) tasks, comprising structured prediction (SP) and reinforcement learning (RL) tasks. These tasks are characterized by generation of sequential outputs that exhibit interdependencies among them. Notable examples include machine translation (MT), where the objective is to map variable-length input sequences to variable-length output sequences and robotic RL tasks like object grasping, where a sequence of actions must be generated to accomplish the task, with each action influencing future actions.
Contextualized representations play a vital role in making decisions at each step. Modern architectures like Recurrent Neural Networks (RNNs) have become standard tools for obtaining contextualized inputs and achieving state-of-the-art results in SDP tasks. However, RNNs demand high-compute GPU resources and are impractical for low-resource settings. Consequently, the first part of the thesis explores the application of random context encoders, specifically to investigate the performance gap compared to fully trained RNNs. It begins by showcasing the capabilities of echo state networks (ESNs), a type of RNNs, to memorize and reproduce arbitrary sequences of data, such as text, images, and videos, without requiring full training. Next, ESNs are applied to Named Entity Recognition (NER) and learning control policies using RL, highlighting their effectiveness as randomized contextualized encoders.
The thesis then shifts its focus to address the challenges of exploration and high sample complexity in RL. Specifically, two approaches that incorporate additional knowledge into the learning process are introduced. The first approach considers Novelty Search (NS), a method designed to enhance exploration and sample complexity, is considered, and a novel approach utilizing auto-encoders to learn sparse representations of agent behaviors is proposed. This approach outperforms traditional NS methods and provides a solution to promote exploration in RL tasks. Furthermore, the thesis introduces a method to construct policy networks by leveraging domain knowledge to improve transparency, modularity, and data efficiency. This decomposition of policy networks into adaptable and hand-designed components significantly reduces the number of interactions required for learning when compared to fully trained end-to-end recurrent networks.
In the final part of the thesis, we closely look at the relation between SP and RL tasks. In particular, we consider the formulation of SP as an RL task to overcome the problems of data and metric mismatch associated with training SP using supervised learning. Despite the successful application of RL in these settings, research in this direction is hindered by a lack of open-source toolkits. To mitigate this, we first propose NLPGym, a modular toolkit that casts typical NLP tasks as RL tasks, allowing RL algorithms to directly optimize any application-specific metric. Building upon this and utilizing large-language models (LLMs), we present an open-source library RL4LMs that can fine-tune LLMs on arbitrary reward functions, including learned reward models based on automated metrics and human preferences. Additionally, a comprehensive benchmark to evaluate LLMs on various text generation tasks and a new algorithm has been introduced to fine-tune LLMs efficiently.

Klassifikation (DDC)

004 Informatik

Zitiervorschlag
BibTeX

Ramamurthy, Rajkumar: Practical Models for Sequential Decision Making in Natural Language Processing and Reinforcement Learning. - Bonn, 2023. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-73178

@phdthesis{handle:20.500.11811/11139,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-73178,
doi: https://doi.org/10.48565/bonndoc-161,
author = {{Rajkumar Ramamurthy}},
title = {Practical Models for Sequential Decision Making in Natural Language Processing and Reinforcement Learning},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2023,
month = nov,
note = {This thesis focuses on sequential decision and prediction (SDP) tasks, comprising structured prediction (SP) and reinforcement learning (RL) tasks. These tasks are characterized by generation of sequential outputs that exhibit interdependencies among them. Notable examples include machine translation (MT), where the objective is to map variable-length input sequences to variable-length output sequences and robotic RL tasks like object grasping, where a sequence of actions must be generated to accomplish the task, with each action influencing future actions.
Contextualized representations play a vital role in making decisions at each step. Modern architectures like Recurrent Neural Networks (RNNs) have become standard tools for obtaining contextualized inputs and achieving state-of-the-art results in SDP tasks. However, RNNs demand high-compute GPU resources and are impractical for low-resource settings. Consequently, the first part of the thesis explores the application of random context encoders, specifically to investigate the performance gap compared to fully trained RNNs. It begins by showcasing the capabilities of echo state networks (ESNs), a type of RNNs, to memorize and reproduce arbitrary sequences of data, such as text, images, and videos, without requiring full training. Next, ESNs are applied to Named Entity Recognition (NER) and learning control policies using RL, highlighting their effectiveness as randomized contextualized encoders.
The thesis then shifts its focus to address the challenges of exploration and high sample complexity in RL. Specifically, two approaches that incorporate additional knowledge into the learning process are introduced. The first approach considers Novelty Search (NS), a method designed to enhance exploration and sample complexity, is considered, and a novel approach utilizing auto-encoders to learn sparse representations of agent behaviors is proposed. This approach outperforms traditional NS methods and provides a solution to promote exploration in RL tasks. Furthermore, the thesis introduces a method to construct policy networks by leveraging domain knowledge to improve transparency, modularity, and data efficiency. This decomposition of policy networks into adaptable and hand-designed components significantly reduces the number of interactions required for learning when compared to fully trained end-to-end recurrent networks.
In the final part of the thesis, we closely look at the relation between SP and RL tasks. In particular, we consider the formulation of SP as an RL task to overcome the problems of data and metric mismatch associated with training SP using supervised learning. Despite the successful application of RL in these settings, research in this direction is hindered by a lack of open-source toolkits. To mitigate this, we first propose NLPGym, a modular toolkit that casts typical NLP tasks as RL tasks, allowing RL algorithms to directly optimize any application-specific metric. Building upon this and utilizing large-language models (LLMs), we present an open-source library RL4LMs that can fine-tune LLMs on arbitrary reward functions, including learned reward models based on automated metrics and human preferences. Additionally, a comprehensive benchmark to evaluate LLMs on various text generation tasks and a new algorithm has been introduced to fine-tune LLMs efficiently.},
url = {https://hdl.handle.net/20.500.11811/11139}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: