Show simple item record

Visual Failure Detection in Robotics

dc.contributor.advisorGall, Jürgen
dc.contributor.authorThoduka, Santosh George
dc.date.accessioned2026-03-31T14:51:25Z
dc.date.available2026-03-31T14:51:25Z
dc.date.issued31.03.2026
dc.identifier.urihttps://hdl.handle.net/20.500.11811/14054
dc.description.abstractAutonomous robots in human-centric environments can encounter unforeseen situations, making them prone to task execution failures. These failures can lead to unsafe conditions for both humans and robots and result in a loss of trust in robots. Therefore, robots should have the ability to prevent, detect, and respond to failures. In this thesis, we focus on the detection of failures, particularly using video data from the robot's camera. Several visual failure detection datasets have emerged in recent years; however, there is still a scarcity of datasets suitable for building general-purpose failure detection approaches. Existing work has shown the benefit of using multimodal data for failure detection, but determining the best data representation and fusion methods remains an open challenge. Additionally, although most approaches develop failure detection models for specific tasks, they often fail to incorporate task knowledge into the learning process. In our work, we introduce multimodal datasets and explore multimodal learning and task knowledge integration to enhance failure detection performance. We contribute two visual failure detection datasets: the Bookshelf dataset and the Handover Failure Detection dataset, which consist of various failures that occur when the robot is placing a book on a shelf and performing object handovers with people. For the Bookshelf dataset, we use video and proprioceptive data to detect anomalous situations by comparing expected and observed motions. For the Handover dataset and a visual-tactile dataset, we find that intermediate fusion of video, force-torque, tactile and proprioceptive features performs best. For the Handover dataset, video was found to be an essential modality, and learning to predict the human's and robot's actions as auxiliary tasks is also beneficial. Finally, we explore incorporating task knowledge to improve failure classification performance. We show that pre-processing video frames using the known temporal boundaries of the robot’s actions and locations of objects in the scene improves results on a large-scale failure dataset, and a variable frame rate-based data augmentation method shows further improvements. Our results highlight the importance of using multimodal data and task knowledge for failure detection. However, the task-specific nature of existing models makes it impractical to collect data and train a separate model for every task. Using simulators to generate large-scale video data is a viable approach to this problem in future work. The emergence of general-purpose vision-language-action models also presents opportunities for task-agnostic failure detection. Incorporating failure data into their training datasets and, similar to our work, making use of multimodal data and task knowledge are likely to further accelerate progress in this area.en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectvisuelle Fehlererkennung
dc.subjectRobotik
dc.subjectDeep Learning
dc.subjectAnomalieerkennung
dc.subjectVideoanalyse
dc.subjectvisual failure detection
dc.subjectrobotics
dc.subjectdeep learning
dc.subjectanomaly detection
dc.subjectvideo analytics
dc.subject.ddc004 Informatik
dc.titleVisual Failure Detection in Robotics
dc.typeDissertation oder Habilitation
dc.identifier.doihttps://doi.org/10.48565/bonndoc-836
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-89239
dc.relation.doihttps://doi.org/10.1109/IROS51168.2021.9636133
dc.relation.doihttps://doi.org/10.1109/ICRA57147.2024.10610143
dc.relation.doihttps://doi.org/10.1109/ICPR56361.2022.9955646
dc.relation.doihttps://doi.org/10.1109/ECMR65884.2025.11162998
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID8923
ulbbnediss.date.accepted26.03.2026
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereePlöger, Paul G.
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0003-4085-4943


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

The following license files are associated with this item:

InCopyright