From Closed- to Open-World Panoptic Segmentation

Sodano, Matteo

Volltext

View/Open (24.7MB)

Author

Sodano, Matteo

ORCID

https://orcid.org/0000-0003-3358-6979

Type of Scholarly Publication

Dissertation

Date of Exam

02.03.2026

Date of Publication

10.03.2026

Advisor

Stachniss, Cyrill

Co-Referee

Adve, Vikram S.

Degree Granting Institutions

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadata

Show full item record

Citable Links

Handle: https://hdl.handle.net/20.500.11811/13963
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-88478

Abstract

Autonomous robotic technologies are increasingly transforming industries by enabling intelligent machines to perform complex tasks with minimal human intervention, increasing efficiency, safety, and scalability. Robots can take over tasks that are dangerous, repetitive, or simply impractical for humans, paving the way for more efficient and reliable workflows. For this reason, robots are becoming more and more important in various applications, ranging from agriculture and manufacturing to exploration in unstructured environments and, last but not least, autonomous driving. Despite the diversity of these domains, all autonomous systems share a fundamental requirement: they must be able to perceive and understand their environment before navigating in and interacting with it. For example, weeding robots need to distinguish crops from weeds to perform their tasks accurately, cooking robots must locate the tools required for a recipe, and autonomous vehicles must recognize other traffic participants to ensure safe and reliable navigation on the road.
Perception is therefore a key building block of any autonomously acting system. Robots can only support humans and operate efficiently if they are able to reliably interpret and understand the environment in which they operate. Over the past decades, we have witnessed tremendous progress in the development of perception algorithms, largely driven by advances in machine learning and deep learning. Among these, segmentation approaches for scene interpretation have emerged as a central task of modern robotic perception research, allowing systems to assign meaningful semantic and instance-level labels to every part of the environment. Semantic information is important to understand what is in the scene and recognize all categories that appear in the environment. Instance information, in contrast, focuses on distinguishing among individual objects. These pieces of information are important for robots that act in the real world and need to interact with it, as recognizing an individual object is crucial for all kinds of downstream tasks, like navigation, manipulation, and more.
The main contributions of this thesis are techniques for robotic perception that enhances scene understanding across multiple domains, sensors, and modalities. In particular, we focus on panoptic segmentation. Panoptic segmentation unifies the semantic and instance-level understanding described above, providing both information simultaneously. In this thesis, we address several key challenges for panoptic segmentation. First, we address this task using RGB-D sensors, developing an algorithm that exploits the complementary visual cues to enhance segmentation performance, while remaining robust to either missing RGB or depth input. Then, we investigate how the underlying hierarchy among objects in the scene can be exploited to improve segmentation performance. Finally, we move beyond the classic limitation of panoptic segmentation, shared by most existing state-of-the-art pipelines, namely the so-called closed-world assumption. This assumption constrains perception models to a fixed, predefined set of object categories, which is unrealistic in real-world scenarios where robots will inevitably encounter previously unseen objects. First, we propose a dataset for accurate and reproducible benchmarking of open-world segmentation methods. Then, we develop an algorithm for open-world semantic segmentation, where we aim to discover, at test time, novel semantic categories that never appeared during training. Finally, we tackle the task of open-world panoptic segmentation, aiming to discover both, novel semantic categories and novel object instances.
In summary, this thesis proposes several novel methods and datasets for closed-world and open-world panoptic segmentation, and contributes to the state of the art of robotic perception and scene understanding. The proposed methods have been rigorously evaluated on publicly available datasets and have been published in peer-reviewed journals and conferences. Furthermore, all software implementations are released as open-source to facilitate further research and development in the field.

Classification (DDC)

620 Ingenieurwissenschaften und Maschinenbau

Related Publications

https://doi.org/10.1109/ICRA48891.2023.10160315
https://doi.org/10.1109/ICRA48891.2023.10160918
https://doi.org/10.1109/IROS60139.2025.11246899
https://doi.org/10.1109/CVPR52733.2024.00307
https://doi.org/10.48550/arXiv.2412.12740

Zitiervorschlag
BibTeX

Sodano, Matteo: From Closed- to Open-World Panoptic Segmentation. - Bonn, 2026. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-88478

@phdthesis{handle:20.500.11811/13963,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-88478,
author = {{Matteo Sodano}},
title = {From Closed- to Open-World Panoptic Segmentation},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2026,
month = mar,
note = {Autonomous robotic technologies are increasingly transforming industries by enabling intelligent machines to perform complex tasks with minimal human intervention, increasing efficiency, safety, and scalability. Robots can take over tasks that are dangerous, repetitive, or simply impractical for humans, paving the way for more efficient and reliable workflows. For this reason, robots are becoming more and more important in various applications, ranging from agriculture and manufacturing to exploration in unstructured environments and, last but not least, autonomous driving. Despite the diversity of these domains, all autonomous systems share a fundamental requirement: they must be able to perceive and understand their environment before navigating in and interacting with it. For example, weeding robots need to distinguish crops from weeds to perform their tasks accurately, cooking robots must locate the tools required for a recipe, and autonomous vehicles must recognize other traffic participants to ensure safe and reliable navigation on the road.
Perception is therefore a key building block of any autonomously acting system. Robots can only support humans and operate efficiently if they are able to reliably interpret and understand the environment in which they operate. Over the past decades, we have witnessed tremendous progress in the development of perception algorithms, largely driven by advances in machine learning and deep learning. Among these, segmentation approaches for scene interpretation have emerged as a central task of modern robotic perception research, allowing systems to assign meaningful semantic and instance-level labels to every part of the environment. Semantic information is important to understand what is in the scene and recognize all categories that appear in the environment. Instance information, in contrast, focuses on distinguishing among individual objects. These pieces of information are important for robots that act in the real world and need to interact with it, as recognizing an individual object is crucial for all kinds of downstream tasks, like navigation, manipulation, and more.
The main contributions of this thesis are techniques for robotic perception that enhances scene understanding across multiple domains, sensors, and modalities. In particular, we focus on panoptic segmentation. Panoptic segmentation unifies the semantic and instance-level understanding described above, providing both information simultaneously. In this thesis, we address several key challenges for panoptic segmentation. First, we address this task using RGB-D sensors, developing an algorithm that exploits the complementary visual cues to enhance segmentation performance, while remaining robust to either missing RGB or depth input. Then, we investigate how the underlying hierarchy among objects in the scene can be exploited to improve segmentation performance. Finally, we move beyond the classic limitation of panoptic segmentation, shared by most existing state-of-the-art pipelines, namely the so-called closed-world assumption. This assumption constrains perception models to a fixed, predefined set of object categories, which is unrealistic in real-world scenarios where robots will inevitably encounter previously unseen objects. First, we propose a dataset for accurate and reproducible benchmarking of open-world segmentation methods. Then, we develop an algorithm for open-world semantic segmentation, where we aim to discover, at test time, novel semantic categories that never appeared during training. Finally, we tackle the task of open-world panoptic segmentation, aiming to discover both, novel semantic categories and novel object instances.
In summary, this thesis proposes several novel methods and datasets for closed-world and open-world panoptic segmentation, and contributes to the state of the art of robotic perception and scene understanding. The proposed methods have been rigorously evaluated on publicly available datasets and have been published in peer-reviewed journals and conferences. Furthermore, all software implementations are released as open-source to facilitate further research and development in the field.},
url = {https://hdl.handle.net/20.500.11811/13963}
}

The following license files are associated with this item: