Efficient Perception and Forecasting for Autonomous Vehicles

Li, Shijie

dc.contributor.advisor	Gall, Juergen
dc.contributor.author	Li, Shijie
dc.date.accessioned	2024-04-29T11:05:46Z
dc.date.available	2024-04-29T11:05:46Z
dc.date.issued	29.04.2024
dc.identifier.uri	https://hdl.handle.net/20.500.11811/11518
dc.description.abstract	In recent years, autonomous driving vehicles have attracted a lot of attention. In autonomous driving, safety is the highest priority. To ensure this, scene understanding and motion forecasting play an important role in autonomous driving.Because the LiDAR sensor can capture environment information accurately, in the scene understanding task, we want to assign each point a semantic label. Unfortunately, there are usually massive points lying in the LiDAR point cloud. Processing such a large point cloud usually requires a lot of computation resources. However, only limited computation resources are available on autonomous driving platforms. What's worse, there are usually many modules that run concurrently on autonomous driving which further limits the available computation. This is the same for motion forecasting tasks. Thus we can observe that efficiency is very important for realistic autonomous driving applications. However, previous methods targeting the semantic segmentation of LiDAR point clouds have largely focused on accuracy, often at the expense of efficiency, which is crucial for practical applications. To bridge this gap, we propose a highly efficient architecture tailored to process 2D projection maps generated from input LiDAR sensors. By capitalizing on the intrinsic characteristics of LiDAR sensors and converting sparse 3D data into a dense 2D format, our methodology achieves a commendable balance between accuracy and efficiency. Building on this architecture, we further endeavor to determine the motion status of each point within the LiDAR point cloud. This is realized by adopting a generalized projection-based LiDAR data representation that can encapsulate motion information. Additionally, we have set a new benchmark for LiDAR-based moving object segmentation, anchored on the SemanticKITTI dataset. The aforementioned discussions primarily revolve around projection-based approaches. Moreover, we explore the enhancement of point-based methods for semantic segmentation of LiDAR point clouds. We present a novel concept of reconfiguring 3D point-based operations to operate within the projection space. This ingenious design allows for the conversion of any point-based methods into projection-based ones, markedly improving their accuracy and efficiency for LiDAR point cloud semantic segmentation. A significant hurdle with projection-based methods is the apparent gap between them and voxel-based approaches. We ascertain that this gap arises from the "many-to-one" dilemma, due to the limited horizontal and vertical angular resolution of the range image. To tackle this, we introduce a temporal fusion layer to extract pertinent information from previous scans, integrating it with the current scan. We also suggest a max-voting-based post-processing technique to amend erroneous predictions. As highlighted, besides scene understanding, precise and efficient trajectory forecasting is paramount for ensuring safety in autonomous driving. As the safety of humans is the highest priority, we first work on mapless human trajectory prediction. In this work, we enforce the temporal consistency between historical data and predicted future data. Although this method works well for low-speed scenarios that mainly consist of pedestrians, the autonomous driving scenario is usually highly dynamic where the agents are usually at high speed, like vehicles. In this case, the map can provide meaningful information as the drivers usually need to follow some driving rules. One issue here is that processing map information usually costs a lot of computation resources. To address this, we propose a streamlined architecture that incorporates an additional aggregation token for each lane and trajectory. This facilitates the modeling of global dependencies within each lane or trajectory and between the lanes or trajectories, leading us to name the network the Aggregation-Interaction Transformer. Nonetheless, efficiently modeling the interactions among various road users and drivable lanes remains a formidable challenge. Thus another notable contribution is our approach's ability to learn intention seeds, which serve as queries to generate a diverse array of future trajectories in a highly efficient manner. For both the LiDAR point cloud semantic segmentation and motion forecasting tasks, we evaluate the proposed approaches on several public datasets with different scenarios. Extensive evaluation shows the effectiveness of the proposed approaches in both accuracy and efficiency which shows they are suitable for realistic applications.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject	LiDAR semantic segmentation
dc.subject	Motion forecasting
dc.subject	Autonomous driving
dc.subject	Scene understanding
dc.subject.ddc	004 Informatik
dc.title	Efficient Perception and Forecasting for Autonomous Vehicles
dc.type	Dissertation oder Habilitation
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-75922
dc.relation.doi	https://doi.org/10.1109/LRA.2021.3132059
dc.relation.doi	https://doi.org/10.1109/LRA.2021.3093567
dc.relation.doi	https://doi.org/10.1109/TNNLS.2021.3132836
dc.relation.doi	https://doi.org/10.1109/ICCV48922.2021.00195
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	7592
ulbbnediss.date.accepted	19.04.2024
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Valada, Abhinav
ulbbnediss.contributor.orcid	https://orcid.org/0000-0001-6288-6984

Files in this item

Name:: 7592.pdf
Size:: 28.1MB
Format:: PDF

View/Open

This item appears in the following Collection(s)

E-Dissertationen (4410)

Show simple item record

The following license files are associated with this item: