Weakly and Semi Supervised Semantic Segmentation of RGB Images

Sawatzky, Johann

Volltext

View/Open (37.9MB)

Author

Sawatzky, Johann

Type of Scholarly Publication

Dissertation

Date of Exam

15.12.2020

Date of Publication

11.01.2021

Advisor

Gall, Juergen

Co-Referee

Hamprecht, Fred

Involved Institutions

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadata

Show full item record

Citable Links

Handle: https://hdl.handle.net/20.500.11811/8878
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-60894

Abstract

Teaching machines semantic scene understanding from RGB images received a lot of attention in recent years, since this ability is crucial for several applications like autonomous driving, robotics or video surveillance. Large datasets with dense annotations provided by humans and deep learning methods trained on them boosted the performance in semantic segmentation from mediocre to human level. Still, these methods suffer from a major shortcoming. They require expensive human annotations as soon as a new semantic class has to be learnt.
To reduce the annotation effort by orders of magnitude, one can follow the weakly supervised semantic segmentation paradigm and reduce the cost per image by using cheaper localisation cues like keypoints or bounding boxes instead of precise polygons. Alternatively one can only annotate a fraction of the images in the training set and learn from them as well as the unlabeled ones which would constitute a semi supervised approach. The first part of this thesis concerns object part affordance (functional attribute) segmentation using keypoints as supervision cues. To this end, we introduce a custom dataset with affordance annotations on a pixel level. Additionally, we propose a method that performs significantly better than weakly supervised semantic segmentation methods originally designed for objects. Interestingly, our method generalizes to affordances of novel object classes not present in the train set. Subsequently, we improve upon this method with a second one. One of the strengths of it is the stochastic approximation of the Jaccard index which allows for proper hyper parameter choice even in the absence of ground truth for precise cross validation.
The second part of the thesis treats a setup where object level bounding boxes are given and object part affordances have to be segmented. We propose to annotate the affordances for a tiny number of example objects and then propagate them to the rest of the training set. This way, approximations to ground truth can be obtained for a constant cost. After this we leave the domain of object part affordances and tackle weakly supervised semantic segmentation of object classes using image captions as supervision cues. Image captions not only provide additional object localization cues in form of object attributes but are also freely available on the internet. Using images and their corresponding captions, we train a multi-modal learning approach to locate arbitrary text snippets in an image. We then use it to provide high confidence object class areas in training images which are superior to those obtained from manually curated image tags.
Finally we consider a semi supervised semantic segmentation setup with pixel-wise labels given for a small fraction of images and no supervision cues of any kind for the rest. We propose a method which discovers latent classes maximizing the information gain about the semantic classes on labeled data. On unlabeled data, we use the consistency between the latent classes and the semantic classes as a supervision signal. We show that supervision through latent classes is complementary to other consistency signals like neural discriminators. Furthermore, we show that latent classes learned automatically are superior to manually defined supercategories.
All approaches are compared to contemporary state-of-the-art methods and show an improvement compared to them.

Classification (DDC)

004 Informatik

Zitiervorschlag
BibTeX

Sawatzky, Johann: Weakly and Semi Supervised Semantic Segmentation of RGB Images. - Bonn, 2021. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-60894

@phdthesis{handle:20.500.11811/8878,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-60894,
author = {{Johann Sawatzky}},
title = {Weakly and Semi Supervised Semantic Segmentation of RGB Images},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2021,
month = jan,
note = {Teaching machines semantic scene understanding from RGB images received a lot of attention in recent years, since this ability is crucial for several applications like autonomous driving, robotics or video surveillance. Large datasets with dense annotations provided by humans and deep learning methods trained on them boosted the performance in semantic segmentation from mediocre to human level. Still, these methods suffer from a major shortcoming. They require expensive human annotations as soon as a new semantic class has to be learnt.
To reduce the annotation effort by orders of magnitude, one can follow the weakly supervised semantic segmentation paradigm and reduce the cost per image by using cheaper localisation cues like keypoints or bounding boxes instead of precise polygons. Alternatively one can only annotate a fraction of the images in the training set and learn from them as well as the unlabeled ones which would constitute a semi supervised approach. The first part of this thesis concerns object part affordance (functional attribute) segmentation using keypoints as supervision cues. To this end, we introduce a custom dataset with affordance annotations on a pixel level. Additionally, we propose a method that performs significantly better than weakly supervised semantic segmentation methods originally designed for objects. Interestingly, our method generalizes to affordances of novel object classes not present in the train set. Subsequently, we improve upon this method with a second one. One of the strengths of it is the stochastic approximation of the Jaccard index which allows for proper hyper parameter choice even in the absence of ground truth for precise cross validation.
The second part of the thesis treats a setup where object level bounding boxes are given and object part affordances have to be segmented. We propose to annotate the affordances for a tiny number of example objects and then propagate them to the rest of the training set. This way, approximations to ground truth can be obtained for a constant cost. After this we leave the domain of object part affordances and tackle weakly supervised semantic segmentation of object classes using image captions as supervision cues. Image captions not only provide additional object localization cues in form of object attributes but are also freely available on the internet. Using images and their corresponding captions, we train a multi-modal learning approach to locate arbitrary text snippets in an image. We then use it to provide high confidence object class areas in training images which are superior to those obtained from manually curated image tags.
Finally we consider a semi supervised semantic segmentation setup with pixel-wise labels given for a small fraction of images and no supervision cues of any kind for the rest. We propose a method which discovers latent classes maximizing the information gain about the semantic classes on labeled data. On unlabeled data, we use the consistency between the latent classes and the semantic classes as a supervision signal. We show that supervision through latent classes is complementary to other consistency signals like neural discriminators. Furthermore, we show that latent classes learned automatically are superior to manually defined supercategories.
All approaches are compared to contemporary state-of-the-art methods and show an improvement compared to them.},
url = {https://hdl.handle.net/20.500.11811/8878}
}

The following license files are associated with this item: