Show simple item record

Audio Embeddings for Semi-Supervised Anomalous Sound Detection

dc.contributor.advisorKurth, Frank
dc.contributor.authorWilkinghoff, Kevin
dc.date.accessioned2024-09-03T09:55:44Z
dc.date.available2024-09-03T09:55:44Z
dc.date.issued03.09.2024
dc.identifier.urihttps://hdl.handle.net/20.500.11811/12050
dc.description.abstractDetecting anomalous sounds is a difficult task: First, audio data is very high-dimensional and anomalous signal components are relatively subtle in relation to the entire acoustic scene. Furthermore, normal and anomalous audio signals are not inherently different because defining these terms strongly depends on the application. Third, usually only normal data is available for training a system because anomalies are rare, diverse, costly to produce and in many cases unknown in advance. Such a setting is called semi-supervised anomaly detection. In domain-shifted conditions or when only very limited training data is available, all of these problems are even more severe.
The goal of this thesis is to overcome these difficulties by teaching an embedding model to learn data representations suitable for semi-supervised anomalous sound detection. More specifically, an anomalous sound detection system is designed such that the resulting representations of the data, called embeddings, fulfill the following desired properties: First, normal and anomalous data should be easy to distinguish, which is usually not the case for audio signals because the definition of anomalies is entirely application-dependent. Second, in contrast to audio signals that are very high-dimensional and may have different durations or sampling rates and thus are difficult to handle, embeddings should have a fixed and relatively low dimension. Third, audio signals may have been recorded under very different acoustic conditions leading to strong variability between signals that, from an anomalous sound detection perspective, is not desired. Ideally, embeddings used for detecting anomalies should be mostly insensitive to these acoustic changes and only sensitive to their degree of abnormality.
The main contributions of this thesis are the following: First and foremost, angular margin losses, namely sub-cluster AdaCos, AdaProj and TACos, specifically designed to train embedding models for anomalous sound detection and for few-shot open-set sound event detection are presented. In various experiments, it is shown that the embeddings obtained with these loss functions outperform embeddings obtained by using other angular margin losses, one-class losses or pre-trained embeddings. As another contribution, it is proven that angular margin losses can be seen as a regularized multi-class version of one-class losses, which helps to cope with background noise. Furthermore, design choices for learning embeddings that are robust to acoustic domain shifts by generalizing well to previously unseen domains are presented, which results in an anomalous sound detection system significantly outperforming other state-of-the-art systems. As a last contribution, it is investigated how to obtain good decision thresholds and a novel performance metric, called F1-EV, that measures the difficulty of estimating a good threshold is presented.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectanomaly detection
dc.subjectrepresentation learning
dc.subjectsound event detection
dc.subjectkeyword spotting
dc.subjectdomain generalization
dc.subjectmachine listening
dc.subjectmachine condition monitoring
dc.subject.ddc004 Informatik
dc.titleAudio Embeddings for Semi-Supervised Anomalous Sound Detection
dc.typeDissertation oder Habilitation
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-77432
dc.relation.arxiv2403.14179
dc.relation.doihttps://doi.org/10.1109/IJCNN52387.2021.9534290
dc.relation.doihttps://doi.org/10.1109/ICASSP49357.2023.10097176
dc.relation.doihttps://doi.org/10.1109/TASLP.2023.3337153
dc.relation.doihttps://doi.org/10.23919/EUSIPCO58844.2023.10290003
dc.relation.doihttps://doi.org/10.1109/ICASSP48485.2024.10445814
dc.relation.doihttps://doi.org/10.1109/ICASSP48485.2024.10447156
dc.relation.doihttps://doi.org/10.1109/ICASSP48485.2024.10446011
dc.relation.urlhttps://ieeexplore.ieee.org/document/9657497
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID7743
ulbbnediss.date.accepted02.09.2024
ulbbnediss.dissNotes.externIn reference to IEEE copyrighted material which is used with permission in this thesis, the IEEE does not endorse any of University of Bonn's products or services. Internal or personal use of this material is permitted. If interested in reprinting/republishing IEEE copyrighted material for advertising or promotional purposes or for creating new collective works for resale or redistribution, please go to http://www.ieee.org/publications_standards/publications/rights/rights_link.html to learn how to obtain a License from RightsLink.
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeKlein, Reinhard
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0003-4200-9129


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

The following license files are associated with this item:

InCopyright