Spatial Priors and Uncertainty for Enhanced Reconstruction in Computer Graphics and Vision

Plack, Markus

Volltext

Dokument öffnen (32MB)

Autor

Plack, Markus

ORCID

https://orcid.org/0000-0003-1582-4662

Art der Hochschulschrift

Dissertation

Prüfungsdatum

18.12.2024

Datum der Veröffentlichung

04.02.2025

Erstgutachter

Hullin, Matthias

Zweitgutachter

Klein, Reinhard

Grad-verleihende Institutionen

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadaten

Zur Langanzeige

Zitierbare Links

Handle: https://hdl.handle.net/20.500.11811/12787
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-80796
DOI: https://doi.org/10.48565/bonndoc-499

Inhalt

Many tasks in computer vision aim to reconstruct unknown quantities from observations of a scene, such as estimating depth from stereo vision, generating intermediate video frames, or revealing hidden shapes from transient measurements. In recent years, deep learning and differentiable rendering have become the methods of choice to tackle problems in those domains, but both exhibit a need for extensive computational resources. This thesis introduces innovative strategies that explicitly integrate our understanding of specific problems into reconstruction algorithms via spatial priors and uncertainty representations, improving both their efficiency and output quality by taking advantage of domain-specific peculiarities.
We begin by detailing how integrating rough shapes as priors into stereo matching can refine the depth estimation. In general, for a scene with no prior information, a comprehensive search across all possible values is performed to regress the disparity map between both images. However, in certain scenarios, such as stereo rigs embedded in setups containing other cameras, additional information can be used to improve the reconstruction. Our approach employs an efficient computation of the visual hull to reduce the search range of stereo matching, which, combined with various optimizations tailored to this use case, enables the accurate computation of depth at high resolutions.
Furthermore, we explore frame interpolation of rendered sequences where - in contrast to established methods - it is possible to generate and use additional data from the intermediate frame if necessary. By predicting the uncertainty of the interpolation output and incorporating partial renderings as priors, we devise a novel two-step model based on the transformer architecture that enhances the quality of the interpolated frames even for challenging content, as demonstrated quantitatively and qualitatively through a user study. This approach facilitates replacing the computationally costly rendering of a full sequence with a cheap interpolation of partial renderings.
Lastly, we tackle Non-Line-of-Sight reconstruction and demonstrate how the efficient implementation of a backward pass of a model-driven approach can lead to accurate reconstructions while reducing the runtime from hours or days needed by the baseline approach to minutes. This enabled us to explore different priors, and we show results using Gaussian blobs with an optional color component and total variation regularized depth maps. To address scenarios where the model assumptions deviate too much from the circumstances of real-world measurements, we introduce a background network inspired by neural representations and showcase its utility in capturing the remaining uncertainties.

Schlagwörter

Deep Learning, Differenzierbares Rendering, Stereoskopisches Sehen, Zwischenbild-Generierung, Zeitaufgelöste Bildgebung, Differentiable Rendering, Stereo Vision, Frame Interpolation, Transient Imaging

Klassifikation (DDC)

004 Informatik

Zugehörige Publikation(en)

arXiv:2406.02552
https://doi.org/10.1109/CVPR52729.2023.00946
https://doi.org/10.1109/WACV56688.2023.00308

Zitiervorschlag
BibTeX

Plack, Markus: Spatial Priors and Uncertainty for Enhanced Reconstruction in Computer Graphics and Vision. - Bonn, 2025. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-80796

@phdthesis{handle:20.500.11811/12787,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-80796,
doi: https://doi.org/10.48565/bonndoc-499,
author = {{Markus Plack}},
title = {Spatial Priors and Uncertainty for Enhanced Reconstruction in Computer Graphics and Vision},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = feb,
note = {Many tasks in computer vision aim to reconstruct unknown quantities from observations of a scene, such as estimating depth from stereo vision, generating intermediate video frames, or revealing hidden shapes from transient measurements. In recent years, deep learning and differentiable rendering have become the methods of choice to tackle problems in those domains, but both exhibit a need for extensive computational resources. This thesis introduces innovative strategies that explicitly integrate our understanding of specific problems into reconstruction algorithms via spatial priors and uncertainty representations, improving both their efficiency and output quality by taking advantage of domain-specific peculiarities.
We begin by detailing how integrating rough shapes as priors into stereo matching can refine the depth estimation. In general, for a scene with no prior information, a comprehensive search across all possible values is performed to regress the disparity map between both images. However, in certain scenarios, such as stereo rigs embedded in setups containing other cameras, additional information can be used to improve the reconstruction. Our approach employs an efficient computation of the visual hull to reduce the search range of stereo matching, which, combined with various optimizations tailored to this use case, enables the accurate computation of depth at high resolutions.
Furthermore, we explore frame interpolation of rendered sequences where - in contrast to established methods - it is possible to generate and use additional data from the intermediate frame if necessary. By predicting the uncertainty of the interpolation output and incorporating partial renderings as priors, we devise a novel two-step model based on the transformer architecture that enhances the quality of the interpolated frames even for challenging content, as demonstrated quantitatively and qualitatively through a user study. This approach facilitates replacing the computationally costly rendering of a full sequence with a cheap interpolation of partial renderings.
Lastly, we tackle Non-Line-of-Sight reconstruction and demonstrate how the efficient implementation of a backward pass of a model-driven approach can lead to accurate reconstructions while reducing the runtime from hours or days needed by the baseline approach to minutes. This enabled us to explore different priors, and we show results using Gaussian blobs with an optional color component and total variation regularized depth maps. To address scenarios where the model assumptions deviate too much from the circumstances of real-world measurements, we introduce a background network inspired by neural representations and showcase its utility in capturing the remaining uncertainties.},
url = {https://hdl.handle.net/20.500.11811/12787}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: