Modelling Complex Activities from Visual and Textual Data

Sener Merzbach, Fadime

dc.contributor.advisor	Yao, Angela
dc.contributor.author	Sener Merzbach, Fadime
dc.date.accessioned	2021-07-23T11:23:04Z
dc.date.available	2021-07-23T11:23:04Z
dc.date.issued	23.07.2021
dc.identifier.uri	https://hdl.handle.net/20.500.11811/9235
dc.description.abstract	Complex activity videos are long-range videos composed of multiple sub-activities following some temporal structuring and connected purpose. Recognizing human activities in such videos is a long-standing goal with a broad spectrum of applications, such as assistive technologies, robot-human interactions, or security systems. Although extensive efforts have been made to recognize human actions from short trimmed videos, complex activity videos have received attention only recently. This dissertation provides several models and techniques for understanding human activities in these long-range videos. In particular, we focus on the problems of action anticipation and temporal action segmentation with both supervised and unsupervised learning approaches. Motivated by decreasing the high annotation costs for learning models on complex activity videos, we present two approaches. Given a collection of videos, all of the same complex activity, our temporal action segmentation method partitions videos into sub-activities based on only the visual data in an unsupervised way, following an iterative discriminative-generative approach. Our action anticipation approach generalizes instructional knowledge from large-scale text-corpora and transfers this knowledge to the visual domain using a small scale annotated video dataset. In this work, we challenge ourselves to develop models for describing complex activities with natural language, enabling translation between elements of the visual and textual domains. We also present a complex activity dataset of videos aligned with textual descriptions. We finally present a generic supervised approach for learning representations from long-range videos that we apply to action anticipation and temporal action segmentation. In this work, we investigate the required temporal extent, the representation granularity, and the influence of semantic abstraction with our flexible multi-granular temporal aggregation framework for reasoning from short and long-range observations. This dissertation advances the state of the art in complex activity understanding, challenges the community with new problems, presents novel models that learn visual and temporal relations between human actions, and contributes a dataset for studying the intersection of vision and language. We thoroughly evaluated our approaches and compared them with the respective state of arts on a set of benchmarks. We conclude this dissertation by reporting the characteristics of future research directions and presenting some open issues on complex activity understanding research.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	interpretation komplexer Aktivitäten
dc.subject	zeitgleiche Segmentierung von Aktivitäten
dc.subject	Vorhersage von Aktivitäten
dc.subject	Videoanalyse
dc.subject	Aktionserkennung
dc.subject	complex activity understanding
dc.subject	temporal action segmentation
dc.subject	action anticipation
dc.subject	video analysis
dc.subject	action recognition
dc.subject.ddc	004 Informatik
dc.title	Modelling Complex Activities from Visual and Textual Data
dc.type	Dissertation oder Habilitation
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-63266
ulbbn.pubtype	Erstveröffentlichung
ulbbn.birthname	Sener
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	6326
ulbbnediss.date.accepted	06.07.2021
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Gall, Jürgen
ulbbnediss.contributor.orcid	https://orcid.org/0000-0001-5004-6005
ulbbnediss.contributor.gnd	1244199729

Dateien zu dieser Ressource

Name:: 6326.pdf
Größe:: 13MB
Format:: PDF

Dokument öffnen

Das Dokument erscheint in:

E-Dissertationen (4164)

Zur Kurzanzeige

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: