Predictive Safety in Reinforcement Learning

Elnagdi, Murad

dc.contributor.advisor	Bennewitz, Maren
dc.contributor.author	Elnagdi, Murad
dc.date.accessioned	2026-06-11T06:11:19Z
dc.date.available	2026-06-11T06:11:19Z
dc.date.issued	11.06.2026
dc.identifier.uri	https://hdl.handle.net/20.500.11811/14201
dc.description.abstract	Autonomous robots operating in unstructured environments require control policies that achieve their tasks reliably while respecting safety constraints. While Reinforcement Learning (RL) has emerged as a powerful data-driven paradigm for optimizing these policies through interaction, its application in robotics is hindered by three fundamental barriers during exploration: it is unstable under sparse rewards, since feedback signals are rare or delayed until the task is solved; it is unsafe, risking damage to the robot and its surroundings; and it is sample-inefficient, as it requires extensive interaction with the environment. This thesis addresses these challenges by establishing prediction as a framework for exploration. We combine model predictive control (MPC) and predicted safety signals to guide data collection, constrain risk, and preserve task performance during training and execution of RL agents. Our methodological progression begins by addressing the inefficiency of random exploration in sparse reward settings. We introduce a hybrid training framework that utilizes MPC as a synthetic expert to guide the agent through complex navigation tasks. This approach accelerates convergence by alternating planned trajectories with trial-and-error experience, yielding a lightweight policy for independent deployment. Transitioning to the safety-critical domain of multi-robot systems, we subsequently employ predictive control as a distributed safety filter. We develop a scalable behavior-based formation controller secured by distributed nonlinear MPC shields, which ensures collision-free training, faster convergence, and enables zero-shot transfer to larger teams and to physical hardware. However, relying on static safety shields often induces conservative behavior that hinders learning. To overcome this limitation, we propose a dynamic safety shield that utilizes a supervisor agent to adapt constraint parameters online. By tuning the shield's sensitivity to the environment, we reduce solver failures and prevent the conservative behaviors typical of static shields. Ultimately, to eliminate the computational bottleneck of running optimization solvers at runtime, we transfer these predictive principles into a modular action regulator. This learned mechanism uses cost critics to preemptively adjust actions based on predicted risk, decoupling safety enforcement from reward maximization. Collectively, these studies show that combining short-horizon planning with learned risk estimation makes RL safer and more sample-efficient without sacrificing task performance. The proposed methods are evaluated in extensive simulation, with the MPC-based frameworks further validated on physical robots to demonstrate their robustness under real-world uncertainties.	en
dc.language.iso	eng
dc.rights	Namensnennung-Nicht-kommerziell 4.0 International
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/
dc.subject	reinforcement learning
dc.subject	model predictive control
dc.subject	robotics
dc.subject.ddc	004 Informatik
dc.title	Predictive Safety in Reinforcement Learning
dc.title.alternative	From MPC-Guidance to Learned Regulators
dc.type	Dissertation oder Habilitation
dc.identifier.doi	https://doi.org/10.48565/bonndoc-880
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-90538
dc.relation.doi	https://doi.org/10.1109/ICRA48891.2023.10161492
dc.relation.doi	https://doi.org/10.1109/LRA.2025.3560830
dc.relation.doi	https://doi.org/10.48550/arXiv.2412.04153
dc.relation.doi	https://doi.org/10.48550/arXiv.2510.11491
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	9053
ulbbnediss.date.accepted	20.03.2026
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Ahmad, Aamir
ulbbnediss.contributor.orcid	https://orcid.org/0009-0000-8402-3530
ulbbnediss.contributor.gnd	1402512015