Show simple item record

Efficient Perception and Forecasting for Autonomous Vehicles

dc.contributor.advisorGall, Juergen
dc.contributor.authorLi, Shijie
dc.date.accessioned2024-04-29T11:05:46Z
dc.date.available2024-04-29T11:05:46Z
dc.date.issued29.04.2024
dc.identifier.urihttps://hdl.handle.net/20.500.11811/11518
dc.description.abstractIn recent years, autonomous driving vehicles have attracted a lot of attention. In autonomous driving, safety is the highest priority. To ensure this, scene understanding and motion forecasting play an important role in autonomous driving.Because the LiDAR sensor can capture environment information accurately, in the scene understanding task, we want to assign each point a semantic label. Unfortunately, there are usually massive points lying in the LiDAR point cloud. Processing such a large point cloud usually requires a lot of computation resources. However, only limited computation resources are available on autonomous driving platforms. What's worse, there are usually many modules that run concurrently on autonomous driving which further limits the available computation. This is the same for motion forecasting tasks. Thus we can observe that efficiency is very important for realistic autonomous driving applications.
However, previous methods targeting the semantic segmentation of LiDAR point clouds have largely focused on accuracy, often at the expense of efficiency, which is crucial for practical applications. To bridge this gap, we propose a highly efficient architecture tailored to process 2D projection maps generated from input LiDAR sensors. By capitalizing on the intrinsic characteristics of LiDAR sensors and converting sparse 3D data into a dense 2D format, our methodology achieves a commendable balance between accuracy and efficiency. Building on this architecture, we further endeavor to determine the motion status of each point within the LiDAR point cloud. This is realized by adopting a generalized projection-based LiDAR data representation that can encapsulate motion information. Additionally, we have set a new benchmark for LiDAR-based moving object segmentation, anchored on the SemanticKITTI dataset. The aforementioned discussions primarily revolve around projection-based approaches. Moreover, we explore the enhancement of point-based methods for semantic segmentation of LiDAR point clouds. We present a novel concept of reconfiguring 3D point-based operations to operate within the projection space. This ingenious design allows for the conversion of any point-based methods into projection-based ones, markedly improving their accuracy and efficiency for LiDAR point cloud semantic segmentation. A significant hurdle with projection-based methods is the apparent gap between them and voxel-based approaches. We ascertain that this gap arises from the "many-to-one" dilemma, due to the limited horizontal and vertical angular resolution of the range image. To tackle this, we introduce a temporal fusion layer to extract pertinent information from previous scans, integrating it with the current scan. We also suggest a max-voting-based post-processing technique to amend erroneous predictions.
As highlighted, besides scene understanding, precise and efficient trajectory forecasting is paramount for ensuring safety in autonomous driving. As the safety of humans is the highest priority, we first work on mapless human trajectory prediction. In this work, we enforce the temporal consistency between historical data and predicted future data. Although this method works well for low-speed scenarios that mainly consist of pedestrians, the autonomous driving scenario is usually highly dynamic where the agents are usually at high speed, like vehicles. In this case, the map can provide meaningful information as the drivers usually need to follow some driving rules. One issue here is that processing map information usually costs a lot of computation resources. To address this, we propose a streamlined architecture that incorporates an additional aggregation token for each lane and trajectory. This facilitates the modeling of global dependencies within each lane or trajectory and between the lanes or trajectories, leading us to name the network the Aggregation-Interaction Transformer. Nonetheless, efficiently modeling the interactions among various road users and drivable lanes remains a formidable challenge. Thus another notable contribution is our approach's ability to learn intention seeds, which serve as queries to generate a diverse array of future trajectories in a highly efficient manner.
For both the LiDAR point cloud semantic segmentation and motion forecasting tasks, we evaluate the proposed approaches on several public datasets with different scenarios. Extensive evaluation shows the effectiveness of the proposed approaches in both accuracy and efficiency which shows they are suitable for realistic applications.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectLiDAR semantic segmentation
dc.subjectMotion forecasting
dc.subjectAutonomous driving
dc.subjectScene understanding
dc.subject.ddc004 Informatik
dc.titleEfficient Perception and Forecasting for Autonomous Vehicles
dc.typeDissertation oder Habilitation
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-75922
dc.relation.doihttps://doi.org/10.1109/LRA.2021.3132059
dc.relation.doihttps://doi.org/10.1109/LRA.2021.3093567
dc.relation.doihttps://doi.org/10.1109/TNNLS.2021.3132836
dc.relation.doihttps://doi.org/10.1109/ICCV48922.2021.00195
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID7592
ulbbnediss.date.accepted19.04.2024
ulbbnediss.instituteMathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeValada, Abhinav
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0001-6288-6984


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

The following license files are associated with this item:

InCopyright