Efficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction

Farazi, Hafez

dc.contributor.advisor	Behnke, Sven
dc.contributor.author	Farazi, Hafez
dc.date.accessioned	2024-08-23T10:06:24Z
dc.date.available	2024-08-23T10:06:24Z
dc.date.issued	23.08.2024
dc.identifier.uri	https://hdl.handle.net/20.500.11811/11948
dc.description.abstract	A robot’s ability to perceive the state of its environment is crucial for successful autonomous behavior and complex interactions, such as those in robot-robot and human-robot scenarios. For robot perception systems to function reliably in real-world applications, they must be able to operate in real-time and with sufficient accuracy under a variety of different circumstances. In this thesis, we focus on deep learning approaches, which in recent years have greatly influenced machine learning in general and computer vision in particular. Supervised learning and self-supervised learning paradigms are two major types of learning frameworks for developing visual perception. Supervised approaches are suitable for those perception tasks in which we have a clear definition of the task and have plenty of semantic labels. However, manual labeling is not feasible if we want to leverage the vast amount of available unlabeled video data; hence we need to formulate the task as self-supervised learning. The structure of this thesis reflects these two paradigms and has two parts. The first part examines how humanoid soccer robots in the RoboCup environment can perceive their surroundings using supervised deep-learning models. Initially, we propose a lightweight visual perception pipeline for the humanoid robot to detect soccer-related objects like balls, goalposts, marking lines, and other robots by utilizing convolutional neural networks and transfer learning. These techniques were evaluated during many soccer games and played a substantial role in Team NimbRo’s consecutive yearly wins at the international RoboCup competitions. We then show how, despite the same appearance, identical humanoid robots can track and identify each other on the soccer field using a recurrent model. Next, we study how humanoid robots can estimate the pose of other robots on the soccer field. This part predominantly focuses on machine vision in the context of robot interactions, especially in the competitive environments of RoboCup soccer. In the second part, we discuss how video prediction, as a surrogate task with a self-supervised learning paradigm, can help the agent to understand its environment. Due to the complexity of real-world data, we initially focus on synthetically manufactured datasets for video prediction. As a first step, we study how we can utilize inductive bias to analyze and predict motions in the video using global and local Fourier Domain Transformer Networks with very few learnable parameters. Then, inspired by classical linear dynamical systems theory and the Kalman filter, we investigate simultaneous foreground and background segmentation and their respective motion estimation. Finally, we explore multiple plausible futures prediction using an intention-aware model and extend our models to semantic predictions of human poses. Ultimately, this part aims to enhance the predictive ability of robots with explainable and lightweight models.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	Visuelle Wahrnehmung
dc.subject	Fußballroboter
dc.subject	Bewegungssegmentierung
dc.subject	Videovorhersage
dc.subject	Tiefes Lernen
dc.subject	Computersehen
dc.subject	Humanoide Roboter
dc.subject	Visual Perception
dc.subject	Soccer Robots
dc.subject	Motion Segmentation
dc.subject	Video Prediction
dc.subject	Deep Learning
dc.subject	Computer Vision
dc.subject	Humanoid Robots
dc.subject.ddc	004 Informatik
dc.title	Efficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction
dc.type	Dissertation oder Habilitation
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-77642
dc.relation.doi	https://doi.org/10.48550/arXiv.1809.11078
dc.relation.doi	https://doi.org/10.48550/arXiv.1909.02385
dc.relation.doi	https://doi.org/10.48550/arXiv.1912.07405
dc.relation.doi	https://doi.org/10.48550/arXiv.1810.04941
dc.relation.doi	https://doi.org/10.48550/arXiv.2107.02675
dc.relation.doi	https://doi.org/10.48550/arXiv.1903.00271
dc.relation.doi	https://doi.org/10.48550/arXiv.2004.08638
dc.relation.doi	https://doi.org/10.48550/arXiv.2105.04637
dc.relation.doi	https://doi.org/10.48550/arXiv.2110.02829
dc.relation.doi	https://doi.org/10.1007/978-3-031-15937-4_34
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	7764
ulbbnediss.date.accepted	16.07.2024
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Gall, Jürgen
ulbbnediss.contributor.orcid	https://orcid.org/0000-0002-5284-3355
ulbbnediss.contributor.gnd	1396025685

Dateien zu dieser Ressource

Name:: 7764.pdf
Größe:: 87.4MB
Format:: PDF

Dokument öffnen

Das Dokument erscheint in:

E-Dissertationen (4583)

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: