Clustering Trajectories: Detecting Patterns in Spatio-Temporal Data

Conradi, Jacobus

Volltext

Dokument öffnen (6MB)

Autor

Conradi, Jacobus

ORCID

https://orcid.org/0000-0002-8259-1187

Art der Hochschulschrift

Dissertation

Prüfungsdatum

28.11.2025

Datum der Veröffentlichung

18.12.2025

Erstgutachter

Driemel, Anne

Zweitgutachter

Röglin, Heiko

Grad-verleihende Institutionen

Rheinische Friedrich-Wilhelms-Universität Bonn

Metadaten

Zur Langanzeige

Zitierbare Links

Handle: https://hdl.handle.net/20.500.11811/13768
URN: https://nbn-resolving.org/urn:nbn:de:hbz:5-87045
DOI: https://doi.org/10.48565/bonndoc-750

Inhalt

In this thesis, we study the detection of patterns in spatio-temporal data sets in the form of trajectory data. Many application areas collect unstructured trajectory data. Applications for the analysis of such data range from gait analysis of human movement to traffic-flow analysis, epidemiological hotspot detection, and Lagrangian analysis of particle simulations. Trajectory data is typically modeled as a sequence of points that defines a polygonal curve via linear interpolation between consecutive points. One of the most fundamental ways of detecting patterns in any kind of data is clustering, which aims to partition the data into groups where points from the same group are 'close' to one another and those from different groups are 'far apart'. The clustering of polygonal curves is the main focus of this thesis.
The first clustering problem we study is the (k, l)-median problem for polygonal curves, which asks for k center curves, each with at most l points, minimizing the sum of dynamic time warping (DTW) distances from each input curve to its closest center. Numerous classical results for clustering problems have been extended to many different settings. A common assumption made is that the distance function is a metric, which DTW is not. Despite this, we provide a polynomial-time approximation algorithm with an approximation factor of O(m l), where m is the maximum number of points in any input curve. This is achieved by extending known algorithms for coreset construction—i.e., problem-specific input condensations—from metric spaces to the non-metric space of polygonal curves under DTW.
The second clustering problem we consider is Subtrajectory Covering, motivated by the need to detect patterns within a single curve by segmenting it into its constituent semantic parts. This is formulated as a set cover problem based on the continuous Fréchet distance. Given a polygonal curve with n points, the goal is to compute the smallest set of center curves, each with at most l points, such that every point on the input curve is covered by some center. A point is considered covered if it lies on a subtrajectory of the input curve whose (continuous) Fréchet distance to some center curve is at most Δ. We provide bicriteria approximation algorithms for arbitrary curves as well as for a subclass of realistic curves. Both algorithms approximate the required number k_Δ of centers by a factor of O(log n) and the distance threshold Δ by a factor of O(1). For arbitrary curves, the algorithm runs in roughly O(√k_Δn^5/2) time. For the subclass of realistic curves (or more formally, c-packed curves whose arc length inside any ball of radius r is bounded by cr), the algorithm runs in roughly O(k_Δcn²) time. We demonstrate the viability of both the problem formulation and the algorithm on real-world data.
Additionally, we investigate two related questions. First, given a clustering of trajectory data, a natural question is how to classify a new trajectory which is not part of the input. To this end, we design a data structure that answers approximate nearest neighbor queries: Given n input curves of complexity m and a query curve q, compute the closest input curve to q w.r.t. the continuous Fréchet distance. Our data structure answers such queries in roughly O(F(m, Φ) log n) time, where F(m, Φ) depends only on m and the spread Φ of vertices and edges. Second, we analyze the detection of erroneous patterns within a curve: Given two curves, how can we optimally modify one curve by replacing pieces of the curve with line segments so as to minimize the continuous Fréchet distance to the other curve? We show that this problem is not fixed-parameter tractable in the number of modifications and give a polynomial-time O(1)-approximation algorithm for its decision variant.

Schlagwörter

Clustering, Coreset, Center Clustering, Subtrajecory Clustering, Approximate Nearest Neighbour, Shortcut Fréchet Distance, Fréchet Distance

Klassifikation (DDC)

004 Informatik

Zugehörige Publikation(en)

https://doi.org/10.4230/LIPICS.SOCG.2024.42
https://doi.org/10.4230/LIPICS.ESA.2022.28
https://doi.org/10.48550/ARXIV.2308.14865
https://doi.org/10.4230/LIPICS.ESA.2025.12
https://doi.org/10.57717/CGT.V3I2.45
https://doi.org/10.1145/3663762

Zitiervorschlag
BibTeX

Conradi, Jacobus: Clustering Trajectories: Detecting Patterns in Spatio-Temporal Data. - Bonn, 2025. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-87045

@phdthesis{handle:20.500.11811/13768,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-87045,
doi: https://doi.org/10.48565/bonndoc-750,
author = {{Jacobus Conradi}},
title = {Clustering Trajectories: Detecting Patterns in Spatio-Temporal Data},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2025,
month = dec,
note = {In this thesis, we study the detection of patterns in spatio-temporal data sets in the form of trajectory data. Many application areas collect unstructured trajectory data. Applications for the analysis of such data range from gait analysis of human movement to traffic-flow analysis, epidemiological hotspot detection, and Lagrangian analysis of particle simulations. Trajectory data is typically modeled as a sequence of points that defines a polygonal curve via linear interpolation between consecutive points. One of the most fundamental ways of detecting patterns in any kind of data is clustering, which aims to partition the data into groups where points from the same group are 'close' to one another and those from different groups are 'far apart'. The clustering of polygonal curves is the main focus of this thesis.
The first clustering problem we study is the (k, l)-median problem for polygonal curves, which asks for k center curves, each with at most l points, minimizing the sum of dynamic time warping (DTW) distances from each input curve to its closest center. Numerous classical results for clustering problems have been extended to many different settings. A common assumption made is that the distance function is a metric, which DTW is not. Despite this, we provide a polynomial-time approximation algorithm with an approximation factor of O(m l), where m is the maximum number of points in any input curve. This is achieved by extending known algorithms for coreset construction—i.e., problem-specific input condensations—from metric spaces to the non-metric space of polygonal curves under DTW.
The second clustering problem we consider is Subtrajectory Covering, motivated by the need to detect patterns within a single curve by segmenting it into its constituent semantic parts. This is formulated as a set cover problem based on the continuous Fréchet distance. Given a polygonal curve with n points, the goal is to compute the smallest set of center curves, each with at most l points, such that every point on the input curve is covered by some center. A point is considered covered if it lies on a subtrajectory of the input curve whose (continuous) Fréchet distance to some center curve is at most Δ. We provide bicriteria approximation algorithms for arbitrary curves as well as for a subclass of realistic curves. Both algorithms approximate the required number k_Δ of centers by a factor of O(log n) and the distance threshold Δ by a factor of O(1). For arbitrary curves, the algorithm runs in roughly O(√k_Δn^5/2) time. For the subclass of realistic curves (or more formally, c-packed curves whose arc length inside any ball of radius r is bounded by cr), the algorithm runs in roughly O(k_Δcn²) time. We demonstrate the viability of both the problem formulation and the algorithm on real-world data.
Additionally, we investigate two related questions. First, given a clustering of trajectory data, a natural question is how to classify a new trajectory which is not part of the input. To this end, we design a data structure that answers approximate nearest neighbor queries: Given n input curves of complexity m and a query curve q, compute the closest input curve to q w.r.t. the continuous Fréchet distance. Our data structure answers such queries in roughly O(F(m, Φ) log n) time, where F(m, Φ) depends only on m and the spread Φ of vertices and edges. Second, we analyze the detection of erroneous patterns within a curve: Given two curves, how can we optimally modify one curve by replacing pieces of the curve with line segments so as to minimize the continuous Fréchet distance to the other curve? We show that this problem is not fixed-parameter tractable in the number of modifications and give a polynomial-time O(1)-approximation algorithm for its decision variant.},
url = {https://hdl.handle.net/20.500.11811/13768}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden: