Show simple item record

Efficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction

dc.contributor.advisorBehnke, Sven
dc.contributor.authorFarazi, Hafez
dc.date.accessioned2024-08-23T10:06:24Z
dc.date.available2024-08-23T10:06:24Z
dc.date.issued23.08.2024
dc.identifier.urihttps://hdl.handle.net/20.500.11811/11948
dc.description.abstractA robot’s ability to perceive the state of its environment is crucial for successful autonomous behavior and complex interactions, such as those in robot-robot and human-robot scenarios. For robot perception systems to function reliably in real-world applications, they must be able to operate in real-time and with sufficient accuracy under a variety of different circumstances. In this thesis, we focus on deep learning approaches, which in recent years have greatly influenced machine learning in general and computer vision in particular. Supervised learning and self-supervised learning paradigms are two major types of learning frameworks for developing visual perception. Supervised approaches are suitable for those perception tasks in which we have a clear definition of the task and have plenty of semantic labels. However, manual labeling is not feasible if we want to leverage the vast amount of available unlabeled video data; hence we need to formulate the task as self-supervised learning. The structure of this thesis reflects these two paradigms and has two parts.
The first part examines how humanoid soccer robots in the RoboCup environment can perceive their surroundings using supervised deep-learning models. Initially, we propose a lightweight visual perception pipeline for the humanoid robot to detect soccer-related objects like balls, goalposts, marking lines, and other robots by utilizing convolutional neural networks and transfer learning. These techniques were evaluated during many soccer games and played a substantial role in Team NimbRo’s consecutive yearly wins at the international RoboCup competitions. We then show how, despite the same appearance, identical humanoid robots can track and identify each other on the soccer field using a recurrent model. Next, we study how humanoid robots can estimate the pose of other robots on the soccer field. This part predominantly focuses on machine vision in the context of robot interactions, especially in the competitive environments of RoboCup soccer.
In the second part, we discuss how video prediction, as a surrogate task with a self-supervised learning paradigm, can help the agent to understand its environment. Due to the complexity of real-world data, we initially focus on synthetically manufactured datasets for video prediction. As a first step, we study how we can utilize inductive bias to analyze and predict motions in the video using global and local Fourier Domain Transformer Networks with very few learnable parameters. Then, inspired by classical linear dynamical systems theory and the Kalman filter, we investigate simultaneous foreground and background segmentation and their respective motion estimation. Finally, we explore multiple plausible futures prediction using an intention-aware model and extend our models to semantic predictions of human poses. Ultimately, this part aims to enhance the predictive ability of robots with explainable and lightweight models.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectVisuelle Wahrnehmung
dc.subjectFußballroboter
dc.subjectBewegungssegmentierung
dc.subjectVideovorhersage
dc.subjectTiefes Lernen
dc.subjectComputersehen
dc.subjectHumanoide Roboter
dc.subjectVisual Perception
dc.subjectSoccer Robots
dc.subjectMotion Segmentation
dc.subjectVideo Prediction
dc.subjectDeep Learning
dc.subjectComputer Vision
dc.subjectHumanoid Robots
dc.subject.ddc004 Informatik
dc.titleEfficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction
dc.typeDissertation oder Habilitation
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-77642
dc.relation.doihttps://doi.org/10.48550/arXiv.1809.11078
dc.relation.doihttps://doi.org/10.48550/arXiv.1909.02385
dc.relation.doihttps://doi.org/10.48550/arXiv.1912.07405
dc.relation.doihttps://doi.org/10.48550/arXiv.1810.04941
dc.relation.doihttps://doi.org/10.48550/arXiv.2107.02675
dc.relation.doihttps://doi.org/10.48550/arXiv.1903.00271
dc.relation.doihttps://doi.org/10.48550/arXiv.2004.08638
dc.relation.doihttps://doi.org/10.48550/arXiv.2105.04637
dc.relation.doihttps://doi.org/10.48550/arXiv.2110.02829
dc.relation.doihttps://doi.org/10.1007/978-3-031-15937-4_34
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID7764
ulbbnediss.date.accepted16.07.2024
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeGall, Jürgen
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0002-5284-3355


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

The following license files are associated with this item:

InCopyright