Predicting Rules for Cancer Subtype Classification using Grammar-Based Genetic Programming on various Genomic Data Types
Predicting Rules for Cancer Subtype Classification using Grammar-Based Genetic Programming on various Genomic Data Types
dc.contributor.advisor | Perner, Sven | |
dc.contributor.author | Deng, Mario | |
dc.date.accessioned | 2020-04-24T23:31:28Z | |
dc.date.available | 2020-04-24T23:31:28Z | |
dc.date.issued | 12.03.2018 | |
dc.identifier.uri | https://hdl.handle.net/20.500.11811/7501 | |
dc.description.abstract | With the advent of high-throughput methods more genomic data then ever has been generated during the past decade. As these technologies remain cost intensive and not worthwhile for every research group, databases, such as the TCGA and Firebrowse, emerged. While these database enable the fast and free access to massive amounts of genomic data, they also embody new challenges to the research community. This study investigates methods to obtain, normalize and process genomic data for computer aided decision making in the field of cancer subtype discovery. A new software, termed FirebrowseR is introduced, allowing the direct download of genomic data sets into the R programming environment. To pre-process the obtained data, a set of methods is introduced, enabling data type specific normalization. As a proof of principle, the Web-TCGA software is created, enabling fast data analysis. To explore cancer subtypes a statistical model, the EDL, is introduced. The newly developed method is designed to provide highly precise, yet interpretable models. The EDL is tested on well established data sets, while its performance is compared to state of the art machine learning algorithms. As a proof of principle, the EDL was run on a cohort of 1,000 breast cancer patients, where it reliably re-identified the known subtypes and automatically selected the corresponding maker genes, by which the subtypes are defined. In addition, novel patterns of alterations in well known maker genes could be identified to distinguish primary and mCRPC samples. The findings suggest that mCRPC is characterized through a unique amplification of the Androgen Receptor, while a significant fraction of primary samples is described by a loss of heterozygosity TP53 and NCOR1. | |
dc.language.iso | eng | |
dc.rights | In Copyright | |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | |
dc.subject | Krebs | |
dc.subject | Statistik | |
dc.subject | maschinelles Lernen | |
dc.subject | API | |
dc.subject | Optimierung | |
dc.subject | cancer | |
dc.subject | statistics | |
dc.subject | machine learning | |
dc.subject | optimization | |
dc.subject.ddc | 004 Informatik | |
dc.subject.ddc | 310 Allgemeine Statistiken | |
dc.subject.ddc | 570 Biowissenschaften, Biologie | |
dc.subject.ddc | 610 Medizin, Gesundheit | |
dc.title | Predicting Rules for Cancer Subtype Classification using Grammar-Based Genetic Programming on various Genomic Data Types | |
dc.type | Dissertation oder Habilitation | |
dc.publisher.name | Universitäts- und Landesbibliothek Bonn | |
dc.publisher.location | Bonn | |
dc.rights.accessRights | openAccess | |
dc.identifier.urn | https://nbn-resolving.org/urn:nbn:de:hbz:5n-49769 | |
ulbbn.pubtype | Erstveröffentlichung | |
ulbbnediss.affiliation.name | Rheinische Friedrich-Wilhelms-Universität Bonn | |
ulbbnediss.affiliation.location | Bonn | |
ulbbnediss.thesis.level | Dissertation | |
ulbbnediss.dissID | 4976 | |
ulbbnediss.date.accepted | 29.01.2018 | |
ulbbnediss.institute | Mathematisch-Naturwissenschaftliche Fakultät : Fachgruppe Molekulare Biomedizin / Life & Medical Sciences-Institut (LIMES) | |
ulbbnediss.fakultaet | Mathematisch-Naturwissenschaftliche Fakultät | |
dc.contributor.coReferee | Schultze, Joachim L. |
Dateien zu dieser Ressource
Das Dokument erscheint in:
-
E-Dissertationen (4067)