Knowledge Extraction Methods for the Analysis of Contractual Agreements

Mousavinezhad, Najmehsadat

dc.contributor.advisor	Auer, Sören
dc.contributor.author	Mousavinezhad, Najmehsadat
dc.date.accessioned	2021-11-18T15:58:48Z
dc.date.available	2021-11-18T15:58:48Z
dc.date.issued	18.11.2021
dc.identifier.uri	https://hdl.handle.net/20.500.11811/9414
dc.description.abstract	The ubiquitous availability of the Internet results in a massive number of apps, software, and online services with accompanying contractual agreements in the form of ‘end-user license agreement’ and ‘privacy policy’. Often the textual documents describing rights, policies, and conditions comprise many pages and can not be reasonably assumed to be read and understood by humans. Although everyone is exposed to such consent forms, the majority tend to ignore them due to their length and complexity. However, the cost of ignoring terms and conditions is not always negligible, and occasionally people have to pay (money or other means) as a result of their oversight. In this thesis, we focus on the interpretation of contractual agreements for the benefit of end-users. Contractual agreements encompass both the privacy policies and the general terms and conditions related to software and services. The main characteristics of such agreements are their use of legal terminologies and limited vocabulary. This feature has pros and cons. On one hand, the clear structure and legal language facilitate the mapping between the human-readable agreements and machine-processable concepts. On the other hand, the legal terminologies make the contractual agreement complex, subjective, and, therefore, open to interpretation. This thesis addresses the problem of contractual agreement analysis from both perspectives. In order to provide a structured presentation of contractual agreements, we apply text mining and semantic technologies to develop approaches that extract important information from the agreements and retrieve helpful links and resources for better comprehension. Our approaches are based on ontology-based information extraction, machine learning, and semantic similarity and aim to deliver tedious consent forms in a user friendly and visualized format. The ontology-based information extraction approach processes the human-readable license agreement guided by a domain ontology to extract deontic modalities and presents a summarized output to the end-user. In the extraction phase, we focus on three key rights and conditions: permission, prohibition, duty, and cluster the extracted excerpts according to their similarities. The clustering is based on semantic similarity employing a distributional semantics approach on large word embeddings database. The machine learning method employs deep neural networks to classify a privacy policy’s paragraphs into pre-defined categories. Since the prediction results of the trained model are promising, we further use the predicted classes to assign five risk colors (Green, Yellow, Red) to five privacy icons (Expected Use, Expected Collection, Precise Location, Data Retention and Children Privacy). Furthermore, given that any contractual agreement must comply with the relevant legislation, we utilize text semantic similarity to map an agreement’s content to regulatory documents. The semantic similarity-based approach finds candidate sentences in an agreement that are potentially related to specific articles in the regulation. Then, for each candidate sentence, the relevant article and provision is found according to their semantic similarity. The achieved results from our proposed approaches allow us to conclude that although semi-automatic approaches lead to information loss, they save time and effort by producing instant results and facilitate the end-users understanding of legal texts.	en
dc.language.iso	eng
dc.rights	In Copyright
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/
dc.subject.ddc	004 Informatik
dc.title	Knowledge Extraction Methods for the Analysis of Contractual Agreements
dc.type	Dissertation oder Habilitation
dc.publisher.name	Universitäts- und Landesbibliothek Bonn
dc.publisher.location	Bonn
dc.rights.accessRights	openAccess
dc.identifier.urn	https://nbn-resolving.org/urn:nbn:de:hbz:5-64537
ulbbn.pubtype	Erstveröffentlichung
ulbbnediss.affiliation.name	Rheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.location	Bonn
ulbbnediss.thesis.level	Dissertation
ulbbnediss.dissID	6453
ulbbnediss.date.accepted	31.05.2021
ulbbnediss.institute	Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Informatik / Institut für Informatik
ulbbnediss.fakultaet	Mathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coReferee	Lehmann, Jens
ulbbnediss.contributor.gnd	1250045509

Files in this item

Name:: 6453.pdf
Size:: 5.3MB
Format:: PDF

View/Open

This item appears in the following Collection(s)

E-Dissertationen (4605)

Show simple item record

The following license files are associated with this item: