Mousavinezhad, Najmehsadat: Knowledge Extraction Methods for the Analysis of Contractual Agreements. - Bonn, 2021. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc:
author = {{Najmehsadat Mousavinezhad}},
title = {Knowledge Extraction Methods for the Analysis of Contractual Agreements},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2021,
month = nov,

note = {The ubiquitous availability of the Internet results in a massive number of apps, software, and online services with accompanying contractual agreements in the form of ‘end-user license agreement’ and ‘privacy policy’. Often the textual documents describing rights, policies, and conditions comprise many pages and can not be reasonably assumed to be read and understood by humans. Although everyone is exposed to such consent forms, the majority tend to ignore them due to their length and complexity. However, the cost of ignoring terms and conditions is not always negligible, and occasionally people have to pay (money or other means) as a result of their oversight.
In this thesis, we focus on the interpretation of contractual agreements for the benefit of end-users. Contractual agreements encompass both the privacy policies and the general terms and conditions related to software and services. The main characteristics of such agreements are their use of legal terminologies and limited vocabulary. This feature has pros and cons. On one hand, the clear structure and legal language facilitate the mapping between the human-readable agreements and machine-processable concepts. On the other hand, the legal terminologies make the contractual agreement complex, subjective, and, therefore, open to interpretation. This thesis addresses the problem of contractual agreement analysis from both perspectives.
In order to provide a structured presentation of contractual agreements, we apply text mining and semantic technologies to develop approaches that extract important information from the agreements and retrieve helpful links and resources for better comprehension. Our approaches are based on ontology-based information extraction, machine learning, and semantic similarity and aim to deliver tedious consent forms in a user friendly and visualized format. The ontology-based information extraction approach processes the human-readable license agreement guided by a domain ontology to extract deontic modalities and presents a summarized output to the end-user. In the extraction phase, we focus on three key rights and conditions: permission, prohibition, duty, and cluster the extracted excerpts according to their similarities. The clustering is based on semantic similarity employing a distributional semantics approach on large word embeddings database. The machine learning method employs deep neural networks to classify a privacy policy’s paragraphs into pre-defined categories. Since the prediction results of the trained model are promising, we further use the predicted classes to assign five risk colors (Green, Yellow, Red) to five privacy icons (Expected Use, Expected Collection, Precise Location, Data Retention and Children Privacy). Furthermore, given that any contractual agreement must comply with the relevant legislation, we utilize text semantic similarity to map an agreement’s content to regulatory documents. The semantic similarity-based approach finds candidate sentences in an agreement that are potentially related to specific articles in the regulation. Then, for each candidate sentence, the relevant article and provision is found according to their semantic similarity. The achieved results from our proposed approaches allow us to conclude that although semi-automatic approaches lead to information loss, they save time and effort by producing instant results and facilitate the end-users understanding of legal texts.},

url = {}

The following license files are associated with this item: