Patnala, Ankit: Exploring Self-Supervised Learning Methods for Landcover Applications Using Remote Sensing Data. - Bonn, 2026. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-85550
@phdthesis{handle:20.500.11811/13921,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-85550,
author = {{Ankit Patnala}},
title = {Exploring Self-Supervised Learning Methods for Landcover Applications Using Remote Sensing Data},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2026,
month = feb,

note = {Anthropogenic activities such as crop cultivation, forest conservation, urban development, and industrial establishment significantly alter the Earth's surface characteristics. Thus, assessing such alteration to the Earth's surface is vital for both society and the economy. Monitoring and analyzing the Earth's surface helps to understand and manage the limited natural resources, making it essential for sustainable land use planning and informed decision-making processes. Analyzing the land surface usage thereby involves the intersection of geospatial data, remote sensing technologies, and modeling techniques. While advancements in remote sensing have dramatically increased the availability of data, there remains a need for specialized expertise to handle and interpret such data. While the large amount of remote sensing data enables the use of machine learning, annotating data is a time-consuming and expensive process limiting the applicability of supervised training techniques. Recent developments in self-supervised learning, particularly contrastive learning, have shown promising results. A key advantage is that this method is largely independent of manual annotation since the target for the optimization process can be constructed from the available data itself. Self-supervised learning uses a two-step approach: first, the machine learning model is trained on unlabeled data in a fully self-supervised way. In the second step, transfer learning techniques are applied to adapt the pre-trained model to an annotated dataset. Due to the pre-training, the required amount of annotated data is significantly reduced compared to applications where the machine learning model is trained on annotated data only. Thus, self-supervised learning is more efficient and cost-effective, addressing the challenges associated with manual annotation and offering substantial benefits in Earth observation applications. However, these techniques have been primarily designed for natural images in computer vision. The multi-spectral imagery of Earth observations exhibits unique challenges that set them apart from standard computer vision tasks. For instance, Earth observation data often includes multiple spectral bands beyond the visible spectrum, contains temporal information, and requires domain-specific knowledge for interpretation. By addressing these differences, this thesis aims to bridge the gap between traditional self-supervised learning techniques and the specific needs of Earth observation.
The initial phase of this research highlights the importance of spectral bands beyond RGB in land cover analysis. Building on these findings, atmospheric transformation is proposed for contrastive self-supervised learning on remote sensing images, addressing the challenge of meaningfully handling multiple spectral bands. Upon comparing against a baseline, the following method was superior. While effective for land cover classification, these methods are less suitable for time series crop classification. To develop a self-supervised approach for time series data, multi-modal self-supervised methods for crop classification have been proposed. When evaluated on 9 different time series crop classification tasks, the multi-modal approach outperforms existing uni-modal approaches developed for tabular data such as spectral measurements. To leverage the multi-modal characteristics of remote sensing data and the sequential nature of spectro-temporal data, the approach is further refined by adapting the bi-modal BERT training technique, a prominent self-supervised algorithm from natural language processing. Although these methods effectively handle complex multi-spectral temporal data for crop classification, future research should explore techniques that can utilize dense spatial information to develop more capable models.},

url = {https://hdl.handle.net/20.500.11811/13921}
}

The following license files are associated with this item:

InCopyright