LiDAR-Based Semantic Perception for Autonomous Vehicles

Chen, Xieyuanli

Volltext

Dokument öffnen (41.3MB)

Autor

Chen, Xieyuanli

ORCID

https://orcid.org/0000-0003-0955-6681

Art der Hochschulschrift

Dissertation

Prüfungsdatum

17.08.2022

Datum der Veröffentlichung

08.09.2022

Erstgutachter

Stachniss, Cyrill

Gutachter

McCool, Christopher
Giguère, Philippe

Grad-verleihende Institutionen

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadaten

Zur Langanzeige

Zitierbare Links

Handle: https://hdl.handle.net/20.500.11811/10228
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-67873

Inhalt

Scene understanding is one of the fundamental building blocks that enable mobile systems to achieve autonomy. It is the process of perceiving, analyzing, and elaborating an interpretation of a 3D dynamic scene ob- served through the onboard sensors equipped on autonomous vehicles. The light detection and ranging sensors, in short LiDAR, are one of the popular sensors for autonomous vehicles to sense their surroundings, because they are robust to light changes and provide high-accurate range measurements. Based on LiDAR sensors, autonomous vehicles can explore environments, understand the locations and types of objects therein, and then make plans and execute actions to fulfill complex tasks. Among them, key capabilities are localization within a given map as well as simultaneous localization and mapping (SLAM), which pro- vide the robots location, the necessary prerequisite for other downstream tasks. Traditional LiDAR-based global localization and SLAM methods can provide ac- curate pose estimates in indoor environments with the static world assumption. However, as the demand for autonomous driving in dynamic outdoor environments grew, using only geometric and appearance information is not enough to provide reliable localization and mapping results for autonomous systems. A high-level understanding of the world, which includes the estimation of semantic information, is required for robust and safe deployments of autonomous vehicles in dynamic and complex real-world scenarios.
The main contributions of this thesis are novel approaches that exploit se- mantic information to improve the performance of LiDAR perception tasks such as SLAM and global localization for autonomous vehicles. This thesis consists of three parts. The first part focuses on how to apply semantic information for SLAM and localization. We present a semantic-based LiDAR SLAM method, which exploits semantic predictions from an off-the-shelf semantic segmentation network to improve the pose estimation accuracy and generate consistent seman- tic maps of the environments. We furthermore propose a novel neural network exploiting both geometric and semantic information to estimate the similarities between pairs of LiDAR scans. Based on these similarity estimates, our network can better find loop closure candidates for SLAM and achieve global localization in outdoor environments across seasons.
The second part investigates which type of semantics are useful for specific tasks. In this context, we propose a novel moving object segmentation method for SLAM. It aims at separating the actually moving objects such as driving cars from static or non-moving objects such as buildings, parked cars, etc. With more specific moving/non-moving semantics, we get a better SLAM performance compared to setups using general semantics. For localization, we propose to use pole-like objects such as tra?ic signs, poles, lamps, etc., due to their local distinctiveness and long-term stability. As a result, we obtain reliable and accurate localization results over comparably long periods of time.
Deep learning-based approaches can provide accurate point-wise semantic pre-dictions. They, however, strongly rely on the diversity and amount of labeled training data that may be costly to obtain. In the third part, we therefore propose approaches that can automatically generate labels for training neural networks. Benefiting from specifying and simplifying the semantics for specific tasks, we turn the comparably challenging multiclass semantic segmentation problem into more manageable binary classification tasks, which makes automatic label generation feasible. Using our proposed automatic labeling approach, we alleviate the reliance on expensive human labeling for supervised training of neural networks and enable our method to work in a self-supervised way. Therefore, our pro- posed task-specific semantic-based methods can be easily transferred to different environments with different LiDAR sensors.
All our proposed approaches presented in this thesis have been published in peer-reviewed conference papers and journal articles. Our proposed OverlapNet for LiDAR-based loop closing and localization was nominated for the Best System Paper at the Robotics: Science and Systems (RSS) conference in 2020. Our proposed moving object segmentation method was selected to be presented at the Robotics Science and Systems Pioneers event in 2021. Additionally, we have made implementations of all our methods presented in this thesis open-source to facilitate further research.

Eine der grundlegenden Bausteine, die es mobilen Systemen ermöglichen, autonom zu agieren, ist das Verständnis für die das System umgebenden Welt. Hierfür ist ein komplexer Verarbeitungsprozess nötig, der ausgehend von den auf dem System verbauten Sensoren deren Daten aufnimmt, analysiert, verarbeitet uns schließlich eine Interpretation der u. U. dynamischen Szene liefert. Laserscanning-Sensoren, kurz LiDAR für "light detection and ranging sensors", sind eine der am häufigsten benutzten Sensoren, da sie robust gegenüber Beleuchtungsänderungen sind und hochgenaue Distanzmessungen erlauben. Basierend auf LiDAR-Systemen können autonome Fahrzeuge komplexe Aufgaben erfüllen, indem sie die Position und Art der Objekte in der Szene verstehen, und daraus Handlungsalternativen entwickeln und schließlich ausführen. Zwei wichtige dieser komplexen Aufgaben sind die Lokalisierung in einer gegebenen Karte und simultane Kartenerzeugung und Lokalisierung, kurz SLAM. ("simultaneous localization and mapping", kurz SLAM). Die Position des Systems in einer Karte ist die Vorraussetzung für viele andere Aufgaben. Für eine Innenraum-Umgebung mit der Annahme einer statischen Szene können traditionelle, auf LiDAR basierende Methoden für die globale Lokalisierung oder SLAM-Algorithmen genaue Posenschätzungen zur Verfügung stellen. Sollen sich allerdings selbstfahrende Fahrzeuge im Außenbereich in dynamisch verändernden Szenen bewegen, ist eine rein-geometrische Information für eine sichere Lokalisierung und Kartierung nicht ausreichend. Ein Verstehen der Szene auf einem höheren Abstraktionsniveau, das insbesondere die semantische Information über die Welt beinhaltet, wird zum robusten und sicheren Einsatz von selbstfahrenden Fahrzeugen in komplexen Umgebungen benötigt.
Der Hauptbeitrag dieser Doktorarbeit sind neue Methoden für die LiDAR-basierte globale Lokalisierung und SLAM, die die Vorteile der semantischen Objektinformation ausnutzen. Die Arbeit gliedert sich in drei Teile: Der erste Teil behandelt die Frage, wie SLAM und Lokalisierung durch semantische Information verbessert werden kann. Es wird eine semantik-basierte LiDAR-SLAM Methode vorgestellt. Diese benutzt die Ergebnisse eines semantischen neuronalen Netzes, um die Posengenauigkeit zu verbessern und konsistente semantische Karten der Umgebung zu erstellen. Weiterhin schlagen wir ein neues neuronales Netz vor, das die Ähnlichkeit zwischen zwei LiDAR-Scans unter Ausnutzung von geometrischen und semantischen Informationen ermitteln kann. Basierend auf diesem Ähnlichkeitsmaß kann unser Netzwerk erfolgreich Kandidaten für einen Schleifenschluss in einem SLAM System finden. Außerdem ist es damit möglich, globale Lokaliserung auch bei Daten aus unterschiedlichen Jahreszeiten zu erreichen.
Der zweite Teil der Arbeit untersucht, welche semantischen Klassen für spezielle Aufgaben nützlich sind. In diesem Kontext schlagen wir für SLAM eine neue Methode für die Segmentierung sich bewegender Objekte vor. Die Methode unterscheidet fahrende Fahrzeuge von statischen Objekten wie parkenden Fahrzeugen, Gebäuden, etc. Mit Hilfe der speziellen semantischen Klassen "statisch" und "sich bewegend" erhält man mit SLAM ein besseres Ergebnis als unter Verwendung von einer allgmeinen semantischen Segmentierung. Für die Lokalisierung schlagen wir vor, auf Stangen montierte Objekte wie Verkehrsschilder, Straßenlampen etc. zu verwenden. Diese sind zeitlich stabil und lassen sich einfach von der Umgebung in den Sensordaten unterscheiden. Mit Hilfe der Extraktion solcher Objekte aus den Sensordaten erreichen wir zuverlässige und genaue Lokalisierung des Fahrzeugs über lange Zeiträume hinweg.
Deep-learning basierte Ansätze können genaue punktweise semantische Ergebnisse liefern. Die Qualität hängt allerdings stark von der Menge und Diversität der verwendeten semantisch annotierten Trainingsdaten ab. Daher fokussieren wir uns im dritten Teil der Arbeit auf Verfahren zur automatischen Annotation für das Training von neuronalen Netzwerken. Für unsere speziellen Fragestellungen profitieren wir hier von der Vereinfachung der semantischen Klassen von einem Multi-Klassen-Problem hin zu einem Zwei-Klassen-Problem. Das macht eine vollautomatische Annotation möglich. Mit unserem Ansatz verringern wir die Äbhängigkeit von manuell durch den Menschen erzeugten Labels, sodass wir für die Netzwerke vom überwachten zum sebstüberwachten ("self-supervised") Training wechseln können. Unsere Methoden können einfach auch mit anderen Umgebungsdaten und anderen LiDAR Sensoren verwendet werden.
Alle hier in dieser Arbeit vorgeschlagenen Ansätze sind in begutachteten Konferenzbeiträgen und Zeitschriftenartikeln erschienen. Unser neuronales Netzwerk "OverlapNet" für LiDAR-basierten Schleifenschluss und Lokalisierung ist für den besten Artikel im Bereich Systeme auf der Konferenz "RSS: Robotics: Science and Systems" 2020 nominiert worden. Unsere Methode zur Segmentierung von bewegten Objekten wurde für eine Präsentation auf dem "Robotics Science and Systems Pioneers event" 2021 ausgewählt. Um zukünftige Forschung in den behandelten Themenfeldern zu erleichtern und zu fördern, stellen wir darüber hinaus für alle hier beschriebenen Methoden open-source Code zur Verfügung.

Schlagwörter

Robotics, Deep Learning Method, Localization, Mapping, Scene Understanding

Klassifikation (DDC)

550 Geowissenschaften

620 Ingenieurwissenschaften und Maschinenbau

Zitiervorschlag
BibTeX

Chen, Xieyuanli: LiDAR-Based Semantic Perception for Autonomous Vehicles. - Bonn, 2022. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-67873

@phdthesis{handle:20.500.11811/10228,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-67873,
author = {{Xieyuanli Chen}},
title = {LiDAR-Based Semantic Perception for Autonomous Vehicles},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2022,
month = sep,
note = {Scene understanding is one of the fundamental building blocks that enable mobile systems to achieve autonomy. It is the process of perceiving, analyzing, and elaborating an interpretation of a 3D dynamic scene ob- served through the onboard sensors equipped on autonomous vehicles. The light detection and ranging sensors, in short LiDAR, are one of the popular sensors for autonomous vehicles to sense their surroundings, because they are robust to light changes and provide high-accurate range measurements. Based on LiDAR sensors, autonomous vehicles can explore environments, understand the locations and types of objects therein, and then make plans and execute actions to fulfill complex tasks. Among them, key capabilities are localization within a given map as well as simultaneous localization and mapping (SLAM), which pro- vide the robots location, the necessary prerequisite for other downstream tasks. Traditional LiDAR-based global localization and SLAM methods can provide ac- curate pose estimates in indoor environments with the static world assumption. However, as the demand for autonomous driving in dynamic outdoor environments grew, using only geometric and appearance information is not enough to provide reliable localization and mapping results for autonomous systems. A high-level understanding of the world, which includes the estimation of semantic information, is required for robust and safe deployments of autonomous vehicles in dynamic and complex real-world scenarios.
The main contributions of this thesis are novel approaches that exploit se- mantic information to improve the performance of LiDAR perception tasks such as SLAM and global localization for autonomous vehicles. This thesis consists of three parts. The first part focuses on how to apply semantic information for SLAM and localization. We present a semantic-based LiDAR SLAM method, which exploits semantic predictions from an off-the-shelf semantic segmentation network to improve the pose estimation accuracy and generate consistent seman- tic maps of the environments. We furthermore propose a novel neural network exploiting both geometric and semantic information to estimate the similarities between pairs of LiDAR scans. Based on these similarity estimates, our network can better find loop closure candidates for SLAM and achieve global localization in outdoor environments across seasons.
The second part investigates which type of semantics are useful for specific tasks. In this context, we propose a novel moving object segmentation method for SLAM. It aims at separating the actually moving objects such as driving cars from static or non-moving objects such as buildings, parked cars, etc. With more specific moving/non-moving semantics, we get a better SLAM performance compared to setups using general semantics. For localization, we propose to use pole-like objects such as tra?ic signs, poles, lamps, etc., due to their local distinctiveness and long-term stability. As a result, we obtain reliable and accurate localization results over comparably long periods of time.
Deep learning-based approaches can provide accurate point-wise semantic pre-dictions. They, however, strongly rely on the diversity and amount of labeled training data that may be costly to obtain. In the third part, we therefore propose approaches that can automatically generate labels for training neural networks. Benefiting from specifying and simplifying the semantics for specific tasks, we turn the comparably challenging multiclass semantic segmentation problem into more manageable binary classification tasks, which makes automatic label generation feasible. Using our proposed automatic labeling approach, we alleviate the reliance on expensive human labeling for supervised training of neural networks and enable our method to work in a self-supervised way. Therefore, our pro- posed task-specific semantic-based methods can be easily transferred to different environments with different LiDAR sensors.
All our proposed approaches presented in this thesis have been published in peer-reviewed conference papers and journal articles. Our proposed OverlapNet for LiDAR-based loop closing and localization was nominated for the Best System Paper at the Robotics: Science and Systems (RSS) conference in 2020. Our proposed moving object segmentation method was selected to be presented at the Robotics Science and Systems Pioneers event in 2021. Additionally, we have made implementations of all our methods presented in this thesis open-source to facilitate further research.},
url = {https://hdl.handle.net/20.500.11811/10228}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: