Li, Shijie: Efficient Perception and Forecasting for Autonomous Vehicles. - Bonn, 2024. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-75922
@phdthesis{handle:20.500.11811/11518,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-75922,
author = {{Shijie Li}},
title = {Efficient Perception and Forecasting for Autonomous Vehicles},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2024,
month = apr,

note = {In recent years, autonomous driving vehicles have attracted a lot of attention. In autonomous driving, safety is the highest priority. To ensure this, scene understanding and motion forecasting play an important role in autonomous driving.Because the LiDAR sensor can capture environment information accurately, in the scene understanding task, we want to assign each point a semantic label. Unfortunately, there are usually massive points lying in the LiDAR point cloud. Processing such a large point cloud usually requires a lot of computation resources. However, only limited computation resources are available on autonomous driving platforms. What's worse, there are usually many modules that run concurrently on autonomous driving which further limits the available computation. This is the same for motion forecasting tasks. Thus we can observe that efficiency is very important for realistic autonomous driving applications.
However, previous methods targeting the semantic segmentation of LiDAR point clouds have largely focused on accuracy, often at the expense of efficiency, which is crucial for practical applications. To bridge this gap, we propose a highly efficient architecture tailored to process 2D projection maps generated from input LiDAR sensors. By capitalizing on the intrinsic characteristics of LiDAR sensors and converting sparse 3D data into a dense 2D format, our methodology achieves a commendable balance between accuracy and efficiency. Building on this architecture, we further endeavor to determine the motion status of each point within the LiDAR point cloud. This is realized by adopting a generalized projection-based LiDAR data representation that can encapsulate motion information. Additionally, we have set a new benchmark for LiDAR-based moving object segmentation, anchored on the SemanticKITTI dataset. The aforementioned discussions primarily revolve around projection-based approaches. Moreover, we explore the enhancement of point-based methods for semantic segmentation of LiDAR point clouds. We present a novel concept of reconfiguring 3D point-based operations to operate within the projection space. This ingenious design allows for the conversion of any point-based methods into projection-based ones, markedly improving their accuracy and efficiency for LiDAR point cloud semantic segmentation. A significant hurdle with projection-based methods is the apparent gap between them and voxel-based approaches. We ascertain that this gap arises from the "many-to-one" dilemma, due to the limited horizontal and vertical angular resolution of the range image. To tackle this, we introduce a temporal fusion layer to extract pertinent information from previous scans, integrating it with the current scan. We also suggest a max-voting-based post-processing technique to amend erroneous predictions.
As highlighted, besides scene understanding, precise and efficient trajectory forecasting is paramount for ensuring safety in autonomous driving. As the safety of humans is the highest priority, we first work on mapless human trajectory prediction. In this work, we enforce the temporal consistency between historical data and predicted future data. Although this method works well for low-speed scenarios that mainly consist of pedestrians, the autonomous driving scenario is usually highly dynamic where the agents are usually at high speed, like vehicles. In this case, the map can provide meaningful information as the drivers usually need to follow some driving rules. One issue here is that processing map information usually costs a lot of computation resources. To address this, we propose a streamlined architecture that incorporates an additional aggregation token for each lane and trajectory. This facilitates the modeling of global dependencies within each lane or trajectory and between the lanes or trajectories, leading us to name the network the Aggregation-Interaction Transformer. Nonetheless, efficiently modeling the interactions among various road users and drivable lanes remains a formidable challenge. Thus another notable contribution is our approach's ability to learn intention seeds, which serve as queries to generate a diverse array of future trajectories in a highly efficient manner.
For both the LiDAR point cloud semantic segmentation and motion forecasting tasks, we evaluate the proposed approaches on several public datasets with different scenarios. Extensive evaluation shows the effectiveness of the proposed approaches in both accuracy and efficiency which shows they are suitable for realistic applications.},

url = {https://hdl.handle.net/20.500.11811/11518}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden:

InCopyright