Farazi, Hafez: Efficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction. - Bonn, 2024. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-77642
@phdthesis{handle:20.500.11811/11948,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-77642,
author = {{Hafez Farazi}},
title = {Efficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2024,
month = aug,

note = {A robot’s ability to perceive the state of its environment is crucial for successful autonomous behavior and complex interactions, such as those in robot-robot and human-robot scenarios. For robot perception systems to function reliably in real-world applications, they must be able to operate in real-time and with sufficient accuracy under a variety of different circumstances. In this thesis, we focus on deep learning approaches, which in recent years have greatly influenced machine learning in general and computer vision in particular. Supervised learning and self-supervised learning paradigms are two major types of learning frameworks for developing visual perception. Supervised approaches are suitable for those perception tasks in which we have a clear definition of the task and have plenty of semantic labels. However, manual labeling is not feasible if we want to leverage the vast amount of available unlabeled video data; hence we need to formulate the task as self-supervised learning. The structure of this thesis reflects these two paradigms and has two parts.
The first part examines how humanoid soccer robots in the RoboCup environment can perceive their surroundings using supervised deep-learning models. Initially, we propose a lightweight visual perception pipeline for the humanoid robot to detect soccer-related objects like balls, goalposts, marking lines, and other robots by utilizing convolutional neural networks and transfer learning. These techniques were evaluated during many soccer games and played a substantial role in Team NimbRo’s consecutive yearly wins at the international RoboCup competitions. We then show how, despite the same appearance, identical humanoid robots can track and identify each other on the soccer field using a recurrent model. Next, we study how humanoid robots can estimate the pose of other robots on the soccer field. This part predominantly focuses on machine vision in the context of robot interactions, especially in the competitive environments of RoboCup soccer.
In the second part, we discuss how video prediction, as a surrogate task with a self-supervised learning paradigm, can help the agent to understand its environment. Due to the complexity of real-world data, we initially focus on synthetically manufactured datasets for video prediction. As a first step, we study how we can utilize inductive bias to analyze and predict motions in the video using global and local Fourier Domain Transformer Networks with very few learnable parameters. Then, inspired by classical linear dynamical systems theory and the Kalman filter, we investigate simultaneous foreground and background segmentation and their respective motion estimation. Finally, we explore multiple plausible futures prediction using an intention-aware model and extend our models to semantic predictions of human poses. Ultimately, this part aims to enhance the predictive ability of robots with explainable and lightweight models.},

url = {https://hdl.handle.net/20.500.11811/11948}
}

The following license files are associated with this item:

InCopyright