Efficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction

Farazi, Hafez

Volltext

View/Open (87.4MB)

Author

Farazi, Hafez

ORCID

https://orcid.org/0000-0002-5284-3355

Type of Scholarly Publication

Dissertation

Date of Exam

16.07.2024

Date of Publication

23.08.2024

Advisor

Behnke, Sven

Co-Referee

Gall, Jürgen

Involved Institutions

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadata

Show full item record

Citable Links

Handle: https://hdl.handle.net/20.500.11811/11948
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-77642

Abstract

A robot’s ability to perceive the state of its environment is crucial for successful autonomous behavior and complex interactions, such as those in robot-robot and human-robot scenarios. For robot perception systems to function reliably in real-world applications, they must be able to operate in real-time and with sufficient accuracy under a variety of different circumstances. In this thesis, we focus on deep learning approaches, which in recent years have greatly influenced machine learning in general and computer vision in particular. Supervised learning and self-supervised learning paradigms are two major types of learning frameworks for developing visual perception. Supervised approaches are suitable for those perception tasks in which we have a clear definition of the task and have plenty of semantic labels. However, manual labeling is not feasible if we want to leverage the vast amount of available unlabeled video data; hence we need to formulate the task as self-supervised learning. The structure of this thesis reflects these two paradigms and has two parts.
The first part examines how humanoid soccer robots in the RoboCup environment can perceive their surroundings using supervised deep-learning models. Initially, we propose a lightweight visual perception pipeline for the humanoid robot to detect soccer-related objects like balls, goalposts, marking lines, and other robots by utilizing convolutional neural networks and transfer learning. These techniques were evaluated during many soccer games and played a substantial role in Team NimbRo’s consecutive yearly wins at the international RoboCup competitions. We then show how, despite the same appearance, identical humanoid robots can track and identify each other on the soccer field using a recurrent model. Next, we study how humanoid robots can estimate the pose of other robots on the soccer field. This part predominantly focuses on machine vision in the context of robot interactions, especially in the competitive environments of RoboCup soccer.
In the second part, we discuss how video prediction, as a surrogate task with a self-supervised learning paradigm, can help the agent to understand its environment. Due to the complexity of real-world data, we initially focus on synthetically manufactured datasets for video prediction. As a first step, we study how we can utilize inductive bias to analyze and predict motions in the video using global and local Fourier Domain Transformer Networks with very few learnable parameters. Then, inspired by classical linear dynamical systems theory and the Kalman filter, we investigate simultaneous foreground and background segmentation and their respective motion estimation. Finally, we explore multiple plausible futures prediction using an intention-aware model and extend our models to semantic predictions of human poses. Ultimately, this part aims to enhance the predictive ability of robots with explainable and lightweight models.

Subjects

Visuelle Wahrnehmung, Fußballroboter, Bewegungssegmentierung, Videovorhersage, Tiefes Lernen, Computersehen, Humanoide Roboter, Visual Perception, Soccer Robots, Motion Segmentation, Video Prediction, Deep Learning, Computer Vision, Humanoid Robots

Classification (DDC)

004 Informatik

Related Publications

https://doi.org/10.48550/arXiv.1809.11078
https://doi.org/10.48550/arXiv.1909.02385
https://doi.org/10.48550/arXiv.1912.07405
https://doi.org/10.48550/arXiv.1810.04941
https://doi.org/10.48550/arXiv.2107.02675
https://doi.org/10.48550/arXiv.1903.00271
https://doi.org/10.48550/arXiv.2004.08638
https://doi.org/10.48550/arXiv.2105.04637
https://doi.org/10.48550/arXiv.2110.02829
https://doi.org/10.1007/978-3-031-15937-4_34

Zitiervorschlag
BibTeX

Farazi, Hafez: Efficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction. - Bonn, 2024. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-77642

@phdthesis{handle:20.500.11811/11948,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-77642,
author = {{Hafez Farazi}},
title = {Efficient Visual Perception for Soccer Robots, Motion Segmentation, and Video Prediction},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2024,
month = aug,
note = {A robot’s ability to perceive the state of its environment is crucial for successful autonomous behavior and complex interactions, such as those in robot-robot and human-robot scenarios. For robot perception systems to function reliably in real-world applications, they must be able to operate in real-time and with sufficient accuracy under a variety of different circumstances. In this thesis, we focus on deep learning approaches, which in recent years have greatly influenced machine learning in general and computer vision in particular. Supervised learning and self-supervised learning paradigms are two major types of learning frameworks for developing visual perception. Supervised approaches are suitable for those perception tasks in which we have a clear definition of the task and have plenty of semantic labels. However, manual labeling is not feasible if we want to leverage the vast amount of available unlabeled video data; hence we need to formulate the task as self-supervised learning. The structure of this thesis reflects these two paradigms and has two parts.
The first part examines how humanoid soccer robots in the RoboCup environment can perceive their surroundings using supervised deep-learning models. Initially, we propose a lightweight visual perception pipeline for the humanoid robot to detect soccer-related objects like balls, goalposts, marking lines, and other robots by utilizing convolutional neural networks and transfer learning. These techniques were evaluated during many soccer games and played a substantial role in Team NimbRo’s consecutive yearly wins at the international RoboCup competitions. We then show how, despite the same appearance, identical humanoid robots can track and identify each other on the soccer field using a recurrent model. Next, we study how humanoid robots can estimate the pose of other robots on the soccer field. This part predominantly focuses on machine vision in the context of robot interactions, especially in the competitive environments of RoboCup soccer.
In the second part, we discuss how video prediction, as a surrogate task with a self-supervised learning paradigm, can help the agent to understand its environment. Due to the complexity of real-world data, we initially focus on synthetically manufactured datasets for video prediction. As a first step, we study how we can utilize inductive bias to analyze and predict motions in the video using global and local Fourier Domain Transformer Networks with very few learnable parameters. Then, inspired by classical linear dynamical systems theory and the Kalman filter, we investigate simultaneous foreground and background segmentation and their respective motion estimation. Finally, we explore multiple plausible futures prediction using an intention-aware model and extend our models to semantic predictions of human poses. Ultimately, this part aims to enhance the predictive ability of robots with explainable and lightweight models.},
url = {https://hdl.handle.net/20.500.11811/11948}
}

The following license files are associated with this item: