Adaptive Methods for Robust Document Image Understanding

Konya, Iuliu

dc.contributor.advisor	Bauckhage, Christian
dc.contributor.author	Konya, Iuliu
dc.date.accessioned	2020-04-18T18:41:41Z
dc.date.available	2020-04-18T18:41:41Z
dc.date.issued	09.04.2013
dc.identifier.uri	https://hdl.handle.net/20.500.11811/5655
dc.description.abstract	A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	document image analysis
dc.subject	document image understanding
dc.subject	image enhancement
dc.subject	character enhancement
dc.subject	document binarization
dc.subject	color reduction
dc.subject	skew detection
dc.subject	orientation detection
dc.subject	page segmentation
dc.subject	geometric layout analysis
dc.subject	article segmentation
dc.subject	logical layout analysis
dc.subject.ddc	004 Informatik
dc.title	Adaptive Methods for Robust Document Image Understanding
dc.type	Dissertation oder Habilitation
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5n-31696
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	3169
ulbbnediss.date.accepted	13.03.2013
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Klein, Reinhard

Files in this item

Name:: 3169.pdf
Size:: 15.5MB
Format:: PDF

View/Open

This item appears in the following Collection(s)

E-Dissertationen (4605)

Show simple item record

The following license files are associated with this item: