Conradi, Jacobus: Clustering Trajectories: Detecting Patterns in Spatio-Temporal Data. - Bonn, 2025. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-87045
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-87045
@phdthesis{handle:20.500.11811/13768,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-87045,
doi: https://doi.org/10.48565/bonndoc-750,
author = {{Jacobus Conradi}},
title = {Clustering Trajectories: Detecting Patterns in Spatio-Temporal Data},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = dec,
note = {In this thesis, we study the detection of patterns in spatio-temporal data sets in the form of trajectory data. Many application areas collect unstructured trajectory data. Applications for the analysis of such data range from gait analysis of human movement to traffic-flow analysis, epidemiological hotspot detection, and Lagrangian analysis of particle simulations. Trajectory data is typically modeled as a sequence of points that defines a polygonal curve via linear interpolation between consecutive points. One of the most fundamental ways of detecting patterns in any kind of data is clustering, which aims to partition the data into groups where points from the same group are 'close' to one another and those from different groups are 'far apart'. The clustering of polygonal curves is the main focus of this thesis.
The first clustering problem we study is the (k, l)-median problem for polygonal curves, which asks for k center curves, each with at most l points, minimizing the sum of dynamic time warping (DTW) distances from each input curve to its closest center. Numerous classical results for clustering problems have been extended to many different settings. A common assumption made is that the distance function is a metric, which DTW is not. Despite this, we provide a polynomial-time approximation algorithm with an approximation factor of O(m l), where m is the maximum number of points in any input curve. This is achieved by extending known algorithms for coreset construction—i.e., problem-specific input condensations—from metric spaces to the non-metric space of polygonal curves under DTW.
The second clustering problem we consider is Subtrajectory Covering, motivated by the need to detect patterns within a single curve by segmenting it into its constituent semantic parts. This is formulated as a set cover problem based on the continuous Fréchet distance. Given a polygonal curve with n points, the goal is to compute the smallest set of center curves, each with at most l points, such that every point on the input curve is covered by some center. A point is considered covered if it lies on a subtrajectory of the input curve whose (continuous) Fréchet distance to some center curve is at most Δ. We provide bicriteria approximation algorithms for arbitrary curves as well as for a subclass of realistic curves. Both algorithms approximate the required number kΔ of centers by a factor of O(log n) and the distance threshold Δ by a factor of O(1). For arbitrary curves, the algorithm runs in roughly O(√kΔn5/2) time. For the subclass of realistic curves (or more formally, c-packed curves whose arc length inside any ball of radius r is bounded by cr), the algorithm runs in roughly O(kΔcn2) time. We demonstrate the viability of both the problem formulation and the algorithm on real-world data.
Additionally, we investigate two related questions. First, given a clustering of trajectory data, a natural question is how to classify a new trajectory which is not part of the input. To this end, we design a data structure that answers approximate nearest neighbor queries: Given n input curves of complexity m and a query curve q, compute the closest input curve to q w.r.t. the continuous Fréchet distance. Our data structure answers such queries in roughly O(F(m, Φ) log n) time, where F(m, Φ) depends only on m and the spread Φ of vertices and edges. Second, we analyze the detection of erroneous patterns within a curve: Given two curves, how can we optimally modify one curve by replacing pieces of the curve with line segments so as to minimize the continuous Fréchet distance to the other curve? We show that this problem is not fixed-parameter tractable in the number of modifications and give a polynomial-time O(1)-approximation algorithm for its decision variant.},
url = {https://hdl.handle.net/20.500.11811/13768}
}
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-87045,
doi: https://doi.org/10.48565/bonndoc-750,
author = {{Jacobus Conradi}},
title = {Clustering Trajectories: Detecting Patterns in Spatio-Temporal Data},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = dec,
note = {In this thesis, we study the detection of patterns in spatio-temporal data sets in the form of trajectory data. Many application areas collect unstructured trajectory data. Applications for the analysis of such data range from gait analysis of human movement to traffic-flow analysis, epidemiological hotspot detection, and Lagrangian analysis of particle simulations. Trajectory data is typically modeled as a sequence of points that defines a polygonal curve via linear interpolation between consecutive points. One of the most fundamental ways of detecting patterns in any kind of data is clustering, which aims to partition the data into groups where points from the same group are 'close' to one another and those from different groups are 'far apart'. The clustering of polygonal curves is the main focus of this thesis.
The first clustering problem we study is the (k, l)-median problem for polygonal curves, which asks for k center curves, each with at most l points, minimizing the sum of dynamic time warping (DTW) distances from each input curve to its closest center. Numerous classical results for clustering problems have been extended to many different settings. A common assumption made is that the distance function is a metric, which DTW is not. Despite this, we provide a polynomial-time approximation algorithm with an approximation factor of O(m l), where m is the maximum number of points in any input curve. This is achieved by extending known algorithms for coreset construction—i.e., problem-specific input condensations—from metric spaces to the non-metric space of polygonal curves under DTW.
The second clustering problem we consider is Subtrajectory Covering, motivated by the need to detect patterns within a single curve by segmenting it into its constituent semantic parts. This is formulated as a set cover problem based on the continuous Fréchet distance. Given a polygonal curve with n points, the goal is to compute the smallest set of center curves, each with at most l points, such that every point on the input curve is covered by some center. A point is considered covered if it lies on a subtrajectory of the input curve whose (continuous) Fréchet distance to some center curve is at most Δ. We provide bicriteria approximation algorithms for arbitrary curves as well as for a subclass of realistic curves. Both algorithms approximate the required number kΔ of centers by a factor of O(log n) and the distance threshold Δ by a factor of O(1). For arbitrary curves, the algorithm runs in roughly O(√kΔn5/2) time. For the subclass of realistic curves (or more formally, c-packed curves whose arc length inside any ball of radius r is bounded by cr), the algorithm runs in roughly O(kΔcn2) time. We demonstrate the viability of both the problem formulation and the algorithm on real-world data.
Additionally, we investigate two related questions. First, given a clustering of trajectory data, a natural question is how to classify a new trajectory which is not part of the input. To this end, we design a data structure that answers approximate nearest neighbor queries: Given n input curves of complexity m and a query curve q, compute the closest input curve to q w.r.t. the continuous Fréchet distance. Our data structure answers such queries in roughly O(F(m, Φ) log n) time, where F(m, Φ) depends only on m and the spread Φ of vertices and edges. Second, we analyze the detection of erroneous patterns within a curve: Given two curves, how can we optimally modify one curve by replacing pieces of the curve with line segments so as to minimize the continuous Fréchet distance to the other curve? We show that this problem is not fixed-parameter tractable in the number of modifications and give a polynomial-time O(1)-approximation algorithm for its decision variant.},
url = {https://hdl.handle.net/20.500.11811/13768}
}





