Zur Kurzanzeige

Weakly and Semi Supervised Semantic Segmentation of RGB Images

dc.contributor.advisorGall, Juergen
dc.contributor.authorSawatzky, Johann
dc.date.accessioned2021-01-11T14:05:42Z
dc.date.available2021-01-11T14:05:42Z
dc.date.issued11.01.2021
dc.identifier.urihttps://hdl.handle.net/20.500.11811/8878
dc.description.abstractTeaching machines semantic scene understanding from RGB images received a lot of attention in recent years, since this ability is crucial for several applications like autonomous driving, robotics or video surveillance. Large datasets with dense annotations provided by humans and deep learning methods trained on them boosted the performance in semantic segmentation from mediocre to human level. Still, these methods suffer from a major shortcoming. They require expensive human annotations as soon as a new semantic class has to be learnt.
To reduce the annotation effort by orders of magnitude, one can follow the weakly supervised semantic segmentation paradigm and reduce the cost per image by using cheaper localisation cues like keypoints or bounding boxes instead of precise polygons. Alternatively one can only annotate a fraction of the images in the training set and learn from them as well as the unlabeled ones which would constitute a semi supervised approach. The first part of this thesis concerns object part affordance (functional attribute) segmentation using keypoints as supervision cues. To this end, we introduce a custom dataset with affordance annotations on a pixel level. Additionally, we propose a method that performs significantly better than weakly supervised semantic segmentation methods originally designed for objects. Interestingly, our method generalizes to affordances of novel object classes not present in the train set. Subsequently, we improve upon this method with a second one. One of the strengths of it is the stochastic approximation of the Jaccard index which allows for proper hyper parameter choice even in the absence of ground truth for precise cross validation.
The second part of the thesis treats a setup where object level bounding boxes are given and object part affordances have to be segmented. We propose to annotate the affordances for a tiny number of example objects and then propagate them to the rest of the training set. This way, approximations to ground truth can be obtained for a constant cost. After this we leave the domain of object part affordances and tackle weakly supervised semantic segmentation of object classes using image captions as supervision cues. Image captions not only provide additional object localization cues in form of object attributes but are also freely available on the internet. Using images and their corresponding captions, we train a multi-modal learning approach to locate arbitrary text snippets in an image. We then use it to provide high confidence object class areas in training images which are superior to those obtained from manually curated image tags.
Finally we consider a semi supervised semantic segmentation setup with pixel-wise labels given for a small fraction of images and no supervision cues of any kind for the rest. We propose a method which discovers latent classes maximizing the information gain about the semantic classes on labeled data. On unlabeled data, we use the consistency between the latent classes and the semantic classes as a supervision signal. We show that supervision through latent classes is complementary to other consistency signals like neural discriminators. Furthermore, we show that latent classes learned automatically are superior to manually defined supercategories.
All approaches are compared to contemporary state-of-the-art methods and show an improvement compared to them.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subject.ddc004 Informatik
dc.titleWeakly and Semi Supervised Semantic Segmentation of RGB Images
dc.typeDissertation oder Habilitation
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-60894
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID6089
ulbbnediss.date.accepted15.12.2020
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeHamprecht, Fred


Dateien zu dieser Ressource

Thumbnail

Das Dokument erscheint in:

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden:

InCopyright