Nascimento Esteves da Silva, Diego: Automating the Fact-Checking Task: Challenges and Directions. - Bonn, 2019. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5n-55001
@phdthesis{handle:20.500.11811/8030,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5n-55001,
author = {{Diego Nascimento Esteves da Silva}},
title = {Automating the Fact-Checking Task: Challenges and Directions},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2019,
month = jul,

note = {In recent years, misinformation has caused widespread alarm and has become a global concern, given the negative impact placed on society, democratic institutions and even computing systems whose the primary objective is to serve as a reliable information channel, e.g., Knowledge Bases (KBs). The proliferation of fake news has a wide range of characteristics and different motivations. For instance, it can be produced unintentionally (e.g., the creation process of KBs which is mostly based on automated information extraction methods, thus naturally error-prone) or intentionally (e.g., the spread of misinformation through social media to persuade). Thus, they differ considerably in complexity, structure and number of arguments and propositions. To further exacerbate this problem, an ever-increasing amount of fake news on the Web has created another challenge to drawing correct information. This huge sea of data makes it very difficult for human fact checkers and journalists to assess all the information manually. Therefore, addressing this problem is of utmost importance to minimize real-world circumstances which may provoke a negative impact on the society, in general. Presently Fact-Checking has emerged as a branch of natural language processing devoted to achieving this feat. Under this umbrella, Automated Fact-Checking frameworks have been proposed to perform claim verification. However, given the nature of the problem, different tasks need to be performed, from natural language understanding to source trustworthiness analysis and credibility scoring. In this thesis, we tackle the problem of fake news and underlying challenges related to the process of estimating the veracity of a given claim, discussing challenges and proposing novel models to improve the current state of the art on different sub-tasks. Thus, besides the principal task (i.e., performing automated fact-checking) we also investigate: the recognition of entities on noisy data and the computation of web site credibility. Ultimately, due to the challenging nature of the automated fact-checking task - which requires a complex analysis over several perspectives - we also contribute towards reproducibility of scientific experiments. First, we tackle the named entity recognition problem. We propose a novel multi-level approach named HORUS which - given an input token - generates heuristics based on computer vision and text mining techniques. These heuristics are then used to detect and classify named entities on noisy data (e.g., The Web). Second, we propose WebCred, a novel model to compute the credibility score of a given website, regardless of dependency on search engine results, which is a limiting factor when dealing with real scenarios. WebCred does not require any third-party service and is 100% open-source. Third, we conduct several empirical evaluations and extend DeFacto, a fact-checking framework initially designed to verify English claims in RDF format. DeFacto supports both structured claims (e.g., triple-like) as well as complex claims (i.e., natural language sentences). Last, but not least, we consistently contributed towards better reproducibility research tools, methods, and methodologies. We proposed ontologies (MEX, ML-Schema) and tools (LOG4MEX, MEX-Interfaces, WEB4MEX, WASOTA) which turned into state of the art for better reproducibility of machine learning experiments, becoming part of a global W3C community.},
url = {https://hdl.handle.net/20.500.11811/8030}
}

The following license files are associated with this item:

InCopyright