Show simple item record

Modelling Complex Activities from Visual and Textual Data

dc.contributor.advisorYao, Angela
dc.contributor.authorSener Merzbach, Fadime
dc.date.accessioned2021-07-23T11:23:04Z
dc.date.available2021-07-23T11:23:04Z
dc.date.issued23.07.2021
dc.identifier.urihttps://hdl.handle.net/20.500.11811/9235
dc.description.abstractComplex activity videos are long-range videos composed of multiple sub-activities following some temporal structuring and connected purpose. Recognizing human activities in such videos is a long-standing goal with a broad spectrum of applications, such as assistive technologies, robot-human interactions, or security systems. Although extensive efforts have been made to recognize human actions from short trimmed videos, complex activity videos have received attention only recently. This dissertation provides several models and techniques for understanding human activities in these long-range videos. In particular, we focus on the problems of action anticipation and temporal action segmentation with both supervised and unsupervised learning approaches.
Motivated by decreasing the high annotation costs for learning models on complex activity videos, we present two approaches. Given a collection of videos, all of the same complex activity, our temporal action segmentation method partitions videos into sub-activities based on only the visual data in an unsupervised way, following an iterative discriminative-generative approach. Our action anticipation approach generalizes instructional knowledge from large-scale text-corpora and transfers this knowledge to the visual domain using a small scale annotated video dataset. In this work, we challenge ourselves to develop models for describing complex activities with natural language, enabling translation between elements of the visual and textual domains. We also present a complex activity dataset of videos aligned with textual descriptions. We finally present a generic supervised approach for learning representations from long-range videos that we apply to action anticipation and temporal action segmentation. In this work, we investigate the required temporal extent, the representation granularity, and the influence of semantic abstraction with our flexible multi-granular temporal aggregation framework for reasoning from short and long-range observations.
This dissertation advances the state of the art in complex activity understanding, challenges the community with new problems, presents novel models that learn visual and temporal relations between human actions, and contributes a dataset for studying the intersection of vision and language. We thoroughly evaluated our approaches and compared them with the respective state of arts on a set of benchmarks. We conclude this dissertation by reporting the characteristics of future research directions and presenting some open issues on complex activity understanding research.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectinterpretation komplexer Aktivitäten
dc.subjectzeitgleiche Segmentierung von Aktivitäten
dc.subjectVorhersage von Aktivitäten
dc.subjectVideoanalyse
dc.subjectAktionserkennung
dc.subjectcomplex activity understanding
dc.subjecttemporal action segmentation
dc.subjectaction anticipation
dc.subjectvideo analysis
dc.subjectaction recognition
dc.subject.ddc004 Informatik
dc.titleModelling Complex Activities from Visual and Textual Data
dc.typeDissertation oder Habilitation
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-63266
ulbbn.pubtypeErstveröffentlichung
ulbbn.birthnameSener
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID6326
ulbbnediss.date.accepted06.07.2021
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeGall, Jürgen
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0001-5004-6005
ulbbnediss.contributor.gnd1244199729


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

The following license files are associated with this item:

InCopyright