Arutyunova, Anna: Approximations for Hierarchical and Lower-Bounded Clustering and the Complexity of Minimum-Error Triangulation. - Bonn, 2024. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-75257
@phdthesis{handle:20.500.11811/11449,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-75257,
author = {{Anna Arutyunova}},
title = {Approximations for Hierarchical and Lower-Bounded Clustering and the Complexity of Minimum-Error Triangulation},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2024,
month = mar,

note = {Clustering deals with the problem of finding structures in data. Given a number k we search for a good partition of a set of data points into at most k sets, where the sets of the partition are called clusters. In the hierarchical clustering problem we have to find a sequence of clusterings, one for every possible number k, such that the clusterings in the sequence are hierarchically compatible with each other. We say that such a hierarchical clustering achieves an α-approximation with respect to a given cost function if for every k the cost of its k-clustering can be bounded by α times the cost of an optimal clustering of size k. We construct a clustering instance where there does not exist a hierarchical clustering with approximation factor smaller 4 for the discrete radius and smaller ≈5.828 for radius and diameter. Furthermore we show that for any clustering instance there always exists a hierarchical clustering with approximation factor ≈5.828 for radius and diameter.
In practice agglomerative clustering methods are often used to compute hierarchical clusterings. One such algorithm is the complete linkage algorithm. We know that this algorithm does compute an O(1)-approximation in the case where the metric space is Euclidean and of constant dimension. For general metric spaces we only know that the approximation factor of the algorithm is in Ω(log(k)). We show that the approximation factor is in fact in Ω(k) for discrete radius and diameter and that complete linkage computes an O(k)-approximation for discrete radius and an O(k^1.59)-approximation for diameter.
We further consider k-median/k-means with lower bounds, where we need to partition the set of data points into at most k clusters of size at least B for a given bound B while minimizing the k-median or k-means objective. We present a general approach that shows how to transform any approximation algorithm for facility location with lower bounds into an approximation algorithm for k-median with lower bounds. Since the resulting approximation factors are high, we consider a relaxed variant of lower bounds, which we call weak lower bounds. Here we allow that points can be contained in multiple clusters. We discuss how to compute a (6.5 + ϵ)-approximation for k-median with weak lower bounds and an O(1)-approximation for k-means with weak lower bounds. Furthermore we show that we can transform every solution with weak lower bounds into a solution with 2-weak lower bounds, where every point is contained in at most two clusters, while the cost only increases by a constant factor. We generalize this result and show that in fact we can achieve (1 + ϵ)-weak lower bounds, where every point is assigned to one center and possibly a fraction of ϵ to another center. Finally we present a bi-criteria approximation algorithm for k-means with lower bounds.
In the last part of this thesis we study the minimum-error triangulation problem. Given a set of triangulation points S in the plane and a function f assigning every point in S a real value and a set of reference points R in the convex hull of S with function h assigning every point in R a real value, we want to find a triangulation of S such that the linear interpolation of f on the triangulation represents the values at R as well as possible, i.e., the sum of squared differences between the values at R and their interpolation is minimized. We show that the problem to decide whether there exists a triangulation of zero error is already NP-hard, which implies that there does not exist an approximation algorithm for minimum-error triangulation, which runs in polynomial time unless P = NP.},

url = {https://hdl.handle.net/20.500.11811/11449}
}

Die folgenden Nutzungsbestimmungen sind mit dieser Ressource verbunden:

InCopyright