Spatio-Temporal Scene Understanding for Vehicles in Urban Environments
Spatio-Temporal Scene Understanding for Vehicles in Urban Environments

| dc.contributor.advisor | Stachniss, Cyrill | |
| dc.contributor.author | Marcuzzi, Rodrigo Nicolas | |
| dc.date.accessioned | 2026-02-17T11:16:16Z | |
| dc.date.available | 2026-02-17T11:16:16Z | |
| dc.date.issued | 17.02.2026 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.11811/13899 | |
| dc.description.abstract | Mobile robots are already present in our lives to support people in tasks they are not able or do not want to do. Driving machinery across large fields to help in food production, replacing human workers in repetitive tasks in production lines, and helping us to keep our homes clean are some of the applications nowadays. Robots do not get tired, do not get distracted, and can perform tedious tasks without complaining. One of the long-dreamed tasks that we have been wanting to automate is driving. For autonomous driving, vehicles must perceive their environment accurately to make decisions and plan ahead in a safe way. This includes spatial awareness of the 3D surroundings and, at the same time, understanding their surroundings by knowing the semantic meaning of each element of the scene. For example, identifying the road and sidewalk to know where to drive and where not to. At the same time, it is important to recognize the surrounding buildings and traffic signs and identify other traffic participants such as cars and pedestrians. However, that is not enough to make decisions in dynamic environments. Such autonomous systems must also be able to track cars over time, estimate their velocity, and predict their future actions to make their own decisions, navigate, and avoid accidents. This means that the robot must have a 3D spatial representation of the scene, which also includes semantic meaning to recognize the surroundings and identify and track other traffic participants. To understand their surroundings, robots are usually equipped with sensors such as cameras and LiDAR scanners. Each sensor has its own strengths and weaknesses, and they may provide complementary information. RGB cameras provide images with texture and color, and make it easier to understand details in the scene, but do not work in low-light conditions and lack 3D information. LiDAR sensors provide 3D information about the geometry of the scene, but with a lower resolution and lack real color information, which makes interpreting the surroundings more challenging. The main contribution of this thesis is methods for spatio-temporal and semantic understanding of the surrounding scene. We propose methods for the individual sensor modalities, e.g., LiDAR and RGB cameras, to investigate the capabilities and challenges of each modality independently. In Part I, we work with data from a rotating LiDAR sensor. Given that they already provide geometric information of the 3D surroundings, we focus on estimating the semantic meaning of each part of this 3D scene. Furthermore, we investigate how to enable these approaches to be scalable and learn all aspects of the task directly from data instead of relying on hand-picking parameters. In Part II, we use images from conventional RGB cameras, which contain rich texture and color information. To achieve 3D semantic scene understanding using this data modality, we can leverage existing image segmentation models and therefore focus on estimating the geometry of the surrounding 3D scene combined with this semantic knowledge. We leverage all the available image information and use offline computer vision methods to generate labels. These labels can be used to train neural networks such that they can predict the geometry of the scene in an online fashion without relying on LiDAR data or human annotations. We provide methods to tackle spatio-temporal semantic scene understanding tailored for individual data modalities by addressing the specific challenges and exploiting the strengths of the used sensor. Our proposed approaches have been evaluated on publicly available datasets and published in peer-reviewed journals and conferences. Furthermore, the implementations of our methods are open-source, as well as the generated data, to enable future research using them as a starting point. | en |
| dc.language.iso | eng | |
| dc.rights | In Copyright | |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
| dc.subject.ddc | 004 Informatik | |
| dc.title | Spatio-Temporal Scene Understanding for Vehicles in Urban Environments | |
| dc.type | Dissertation oder Habilitation | |
| dc.publisher.name | Universitäts- und Landesbibliothek Bonn | |
| dc.publisher.location | Bonn | |
| dc.rights.accessRights | openAccess | |
| dc.identifier.urn | https://nbn-resolving.org/urn:nbn:de:hbz:5-88034 | |
| dc.relation.doi | https://doi.org/10.1109/LRA.2022.3140439 | |
| dc.relation.doi | https://doi.org/10.1109/LRA.2023.3236568 | |
| dc.relation.doi | https://doi.org/10.1109/LRA.2023.3320020 | |
| dc.relation.doi | https://doi.org/10.1109/LRA.2025.3557227 | |
| ulbbn.pubtype | Erstveröffentlichung | |
| ulbbnediss.affiliation.name | Rheinische Friedrich-Wilhelms-Universität Bonn | |
| ulbbnediss.affiliation.location | Bonn | |
| ulbbnediss.thesis.level | Dissertation | |
| ulbbnediss.dissID | 8803 | |
| ulbbnediss.date.accepted | 16.01.2026 | |
| ulbbnediss.institute | Agrar-, Ernährungs- und Ingenieurwissenschaftliche Fakultät : Institut für Geodäsie und Geoinformation (IGG) | |
| ulbbnediss.fakultaet | Agrar-, Ernährungs- und Ingenieurwissenschaftliche Fakultät | |
| dc.contributor.coReferee | Leibe, Bastian | |
| ulbbnediss.contributor.orcid | https://orcid.org/0000-0001-8076-0293 |
Files in this item
This item appears in the following Collection(s)
-
E-Dissertationen (1141)




