Pilz, Anja: Entity Linking to Wikipedia : Grounding entity mentions in natural language text using thematic context distance and collective search. - Bonn, 2016. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5n-42406
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5n-42406,
author = {{Anja Pilz}},
title = {Entity Linking to Wikipedia : Grounding entity mentions in natural language text using thematic context distance and collective search},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2016,
month = feb,

note = {This thesis proposes new methods for entity linking in natural language text that assigns entity mentions in unstructured natural language text to the semi-structured encyclopedia Wikipedia. Doing so, entity linking grounds a mention to an encyclopedic entry in Wikipedia and embeds it into this Linked-Open-Data hub. This enables a higher level view on single documents, provides hints for further reading and may be used to add details from other sources. Furthermore, enriching text documents with such links simultaneously resolves the ambiguity of entity names. This ambiguity is an unsolved challenge for many text mining applications: one entity may be designated by a multitude of names and every mention may denote a multitude of entities. Resolving the ambiguity of entity names is thus a crucial step for entity based retrieval, an open problem for most information retrieval and extraction tasks. For instance, search engines relying on heuristic string matches often retrieve irrelevant results as they can not satisfyingly resolve ambiguity.
Moreover, there is a huge number of entity mentions that can not be linked to Wikipedia since albeit of its size, Wikipedia has a restricted coverage. Earlier and current work often ignored this and consequently all mentions of uncovered entities. Other approaches handle only entity mentions of specific types or are focussed on English as target language. Apart from such restrictions, no method achieves perfect linking performance.
These are the tasks approached in this thesis. We introduce new methods for candidate entity retrieval and candidate entity consolidation, the key components to recall and precision, exploiting both the vast amount of structured and unstructured information stored in Wikipedia.
First, we propose a new contextual similarity measure based on latent topic distributions inferred from unstructured natural language text. We show that this thematic distance between mention and candidate entity contexts yields a lower linking error rate than purely word based distances. Being language independent, this method enables high performance entity linking in previously neglected languages such as German and French. This approach is especially suitable, albeit not restricted to link person names, the class of mentions with highest ambiguity.
We next propose a new candidate retrieval method to enable successful entity linking also for other entities that are not referenced canonically or exhibit the thematic coherence of persons. We introduce collective search that uses the structured information encoded in Wikipedia’s hyperlink graph to arrive at sets of strongly related candidate entities. This enables us to better handle synonymy, one of the hardest problems in entity linking and not thoroughly treated in previous work. We emphasize on general applicability and evaluate this method on a broad collection of benchmark corpora both in a supervised as well as in an unsupervised setting. We show that candidate enhancement through collective search increases linking performance on nearly all of these corpora and that our method is the most stable compared to other state-of-the-art approaches. Presenting the first unification of diverse performance measures, we also make a step forward to the comparability of entity linking methods.
In conclusion, we provide state-of-the-art entity linking methods for nearly all of the current use cases. When it comes to fine-tuning, we note that entity linking has subjective aspects and adaptions may be necessary depending on the task at hand.},

url = {https://hdl.handle.net/20.500.11811/6697}

The following license files are associated with this item: