Efficient Human Activity Recognition in Large Image and Video Databases

Cheema, Muhammad Shahzad

Volltext

View/Open (6.9MB)

Author

Cheema, Muhammad Shahzad

Type of Scholarly Publication

Dissertation

Date of Exam

01.09.2014

Date of Publication

11.09.2014

Advisor

Bauckhage, Christian

Co-Referee

Cremers, Armin B.

Involved Institutions

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadata

Show full item record

Citable Links

Handle: https://hdl.handle.net/20.500.11811/6174
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5n-37440

Abstract

Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images?

Classification (DDC)

004 Informatik

Zitiervorschlag
BibTeX

Cheema, Muhammad Shahzad: Efficient Human Activity Recognition in Large Image and Video Databases. - Bonn, 2014. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5n-37440

@phdthesis{handle:20.500.11811/6174,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5n-37440,
author = {{Muhammad Shahzad Cheema}},
title = {Efficient Human Activity Recognition in Large Image and Video Databases},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2014,
month = sep,
note = {Vision-based human action recognition has attracted considerable interest in recent research for its applications to video surveillance, content-based search, healthcare, and interactive games. Most existing research deals with building informative feature descriptors, designing efficient and robust algorithms, proposing versatile and challenging datasets, and fusing multiple modalities. Often, these approaches build on certain conventions such as the use of motion cues to determine video descriptors, application of off-the-shelf classifiers, and single-factor classification of videos. In this thesis, we deal with important but overlooked issues such as efficiency, simplicity, and scalability of human activity recognition in different application scenarios: controlled video environment (e.g.~indoor surveillance), unconstrained videos (e.g.~YouTube), depth or skeletal data (e.g.~captured by Kinect), and person images (e.g.~Flicker). In particular, we are interested in answering questions like (a) is it possible to efficiently recognize human actions in controlled videos without temporal cues? (b) given that the large-scale unconstrained video data are often of high dimension low sample size (HDLSS) nature, how to efficiently recognize human actions in such data? (c) considering the rich 3D motion information available from depth or motion capture sensors, is it possible to recognize both the actions and the actors using only the motion dynamics of underlying activities? and (d) can motion information from monocular videos be used for automatically determining saliency regions for recognizing actions in still images?},
url = {https://hdl.handle.net/20.500.11811/6174}
}

The following license files are associated with this item: