Zur Kurzanzeige

Predictive Safety in Reinforcement Learning

From MPC-Guidance to Learned Regulators

dc.contributor.advisorBennewitz, Maren
dc.contributor.authorElnagdi, Murad
dc.date.accessioned2026-06-11T06:11:19Z
dc.date.available2026-06-11T06:11:19Z
dc.date.issued11.06.2026
dc.identifier.urihttps://hdl.handle.net/20.500.11811/14201
dc.description.abstractAutonomous robots operating in unstructured environments require control policies that achieve their tasks reliably while respecting safety constraints. While Reinforcement Learning (RL) has emerged as a powerful data-driven paradigm for optimizing these policies through interaction, its application in robotics is hindered by three fundamental barriers during exploration: it is unstable under sparse rewards, since feedback signals are rare or delayed until the task is solved; it is unsafe, risking damage to the robot and its surroundings; and it is sample-inefficient, as it requires extensive interaction with the environment. This thesis addresses these challenges by establishing prediction as a framework for exploration. We combine model predictive control (MPC) and predicted safety signals to guide data collection, constrain risk, and preserve task performance during training and execution of RL agents. Our methodological progression begins by addressing the inefficiency of random exploration in sparse reward settings. We introduce a hybrid training framework that utilizes MPC as a synthetic expert to guide the agent through complex navigation tasks. This approach accelerates convergence by alternating planned trajectories with trial-and-error experience, yielding a lightweight policy for independent deployment. Transitioning to the safety-critical domain of multi-robot systems, we subsequently employ predictive control as a distributed safety filter. We develop a scalable behavior-based formation controller secured by distributed nonlinear MPC shields, which ensures collision-free training, faster convergence, and enables zero-shot transfer to larger teams and to physical hardware. However, relying on static safety shields often induces conservative behavior that hinders learning. To overcome this limitation, we propose a dynamic safety shield that utilizes a supervisor agent to adapt constraint parameters online. By tuning the shield's sensitivity to the environment, we reduce solver failures and prevent the conservative behaviors typical of static shields. Ultimately, to eliminate the computational bottleneck of running optimization solvers at runtime, we transfer these predictive principles into a modular action regulator. This learned mechanism uses cost critics to preemptively adjust actions based on predicted risk, decoupling safety enforcement from reward maximization. Collectively, these studies show that combining short-horizon planning with learned risk estimation makes RL safer and more sample-efficient without sacrificing task performance. The proposed methods are evaluated in extensive simulation, with the MPC-based frameworks further validated on physical robots to demonstrate their robustness under real-world uncertainties.en
dc.language.isoeng
dc.rightsNamensnennung-Nicht-kommerziell 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectreinforcement learning
dc.subjectmodel predictive control
dc.subjectrobotics
dc.subject.ddc004 Informatik
dc.titlePredictive Safety in Reinforcement Learning
dc.title.alternativeFrom MPC-Guidance to Learned Regulators
dc.typeDissertation oder Habilitation
dc.identifier.doihttps://doi.org/10.48565/bonndoc-880
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-90538
dc.relation.doihttps://doi.org/10.1109/ICRA48891.2023.10161492
dc.relation.doihttps://doi.org/10.1109/LRA.2025.3560830
dc.relation.doihttps://doi.org/10.48550/arXiv.2412.04153
dc.relation.doihttps://doi.org/10.48550/arXiv.2510.11491
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID9053
ulbbnediss.date.accepted20.03.2026
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeAhmad, Aamir
ulbbnediss.contributor.orcidhttps://orcid.org/0009-0000-8402-3530
ulbbnediss.contributor.gnd1402512015


Dateien zu dieser Ressource

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden:

Namensnennung-Nicht-kommerziell 4.0 International