The Faculty of Mathematics and Natural Sciences: Search
Now showing items 1-10 of 32
Multivariate Correlation Analysis for Supervised Feature Selection in High-Dimensional Data
(2020-03-12)
The main theme of this dissertation focuses on multivariate correlation analysis on different data types and we identify and define various research gaps in the same. For the defined research gaps we develop novel techniques ...
Theoretical Analysis of Hierarchical Clustering and the Shadow Vertex Algorithm
(2020-05-05)
Agglomerative clustering (AC) is a very popular greedy method for computing hierarchical clusterings in practice, yet its theoretical properties have been studied relatively little. We consider AC with respect to the most ...
Mining Frequent Itemsets from Transactional Data Streams with Probabilistic Error Bounds
(2020-05-27)
Frequent itemset mining is a classical data mining task with a broad range of applications, including fraud discovery and product recommendation. The enumeration of frequent itemsets has two main benefits for such applications: First, frequent itemsets provide a human-understandable representation of knowledge. This is crucial as human experts are involved in designing systems for these applications. Second, many efficient algorithms are known for mining frequent itemsets. This is essential as many of today’s realworld applications produce ever-growing data streams. Examples of these are online shopping, electronic payment or phone call transactions. With limited physical main memory, the analysis of data streams can, in general, be only approximate. State-ofthe-art algorithms for frequent itemset mining from such streams bound their error by processing the transactions in blocks of fixed size, either each transaction individually or in mini-batches. In theory, single transaction-based updates provide the most up-todate result after each transaction, but this enumeration is inefficient in practice as the number of frequent itemsets for a single transaction can be exponential in its cardinality. Mini-batch-based algorithms are faster but can only produce a new result at the end of each batch. In this thesis, the binary choice between up-to-date results and speed is eliminated. To provide more flexibility, we develop new algorithms with a probabilistic error bound that can process an arbitrary number of transactions in each batch.<br/>State-of-the-art algorithms mining frequent itemsets from data streams with minibatches derive the size of the mini-batch from a user-defined error parameter and hence couple their error bound to the size of the update. By introducing a dynamic error bound that adapts to the length of the data stream the error is decoupled from the size of the update. The benefits of this approach are twofold: First, the dynamic error bound is independent of the size of the update. Hence, an arbitrary number of transactions can be processed without losing the error bound. Second, the bound becomes tighter as more transactions arrive and thus the tolerated error decreases, in contrast to algorithms with static thresholds. Our approach is extensively compared to the state-of-the-art in an empirical evaluation. The results confirm that the dynamic approach is not only more flexible but also outperforms the state-of-the-art in terms of F-score for a large number of data streams.<br/>As it is easier for experts to extract knowledge from a smaller collection, we consider mining a compact pattern set. Especially useful are parameterized pattern classes for which the expert can regulate the size of the output. An example of such a parameterized pattern class are strongly closed itemsets. Additionally, they are stable against small changes in the data stream. We present an algorithm mining strongly closed itemsets from data streams. It builds on reservoir sampling and is thus capable of producing a result after any number of transactions, once the initial sample is complete. The high approximation quality of the algorithm is empirically demonstrated and the potential of strongly closed patterns for two stream mining tasks is shown: concept drift detection and product configuration recommendation....
Revealing the Invisible: On the Extraction of Latent Information from Generalized Image Data
(2020-01-08)
The desire to reveal the invisible in order to explain the world around us has been a source of impetus for technological and scientific progress throughout human history. Many of the phenomena that directly affect us ...
Planning Hybrid Driving-Stepping Locomotion for Ground Robots in Challenging Environments
(2020-02-10)
Ground robots capable of navigating a wide range of terrains are needed in several domains such as disaster response or planetary exploration. Hybrid driving-stepping locomotion is promising since it combines the complementary ...
Machine Learning Methodologies for Interpretable Compound Activity Predictions
(2020-02-26)
Machine learning (ML) models have gained attention for mining the pharmaceutical data that are currently generated at unprecedented rates and potentially accelerate the discovery of new drugs. The advent of deep learning ...
Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake
(2020-05-05)
Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for citizens. Big Data plays an important role in promoting both manufacturing ...
Linked Research on the Decentralised Web
(2020-05-05)
This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are ...
In Silico Facets of Biochemical Research: Accounts from Protein Folding and Protein-Ligand Interaction Studies
(2020-09-04)
Exponential advancements in computer technology over the last five decades have ubiquitously benefited science and humanity as a whole. Consequent beneficiaries of this surge of computational power include all subfields of science that fall under the umbrella of “biochemical research”. Specifically, proteins, possibly the most versatile of all biological macromolecules, have always been the subject of extensive experimental investigation and more so from a pharmaceutical perspective, since the majority of pharmaceutical drugs target proteins. In silico methods assist experimental research on proteins in multiple ways ranging from relatively simple tasks such as organizing sequences and structures in biological databases to providing atomistic level insights into the structure and dynamics, that form the basis of the biological function of the protein. Given the unquestionable certitude that the three-dimensional structure of the protein determines its function, understanding the formation of structure from sequence, i.e. protein folding, is a central theme of investigation. Despite massive improvements in the understanding of protein folding over the last 50 years, it still remains an unsolved problem.
<br />
Herein, computational approaches involving a combination of molecular modeling and biomolecular simulations are pursued to study important biochemical phenomena, namely, protein folding and protein-ligand interactions. In terms of protein folding, a specific problem, i.e. oxidative self-folding is investigated. Oxidative folding refers to folding that involves the covalent linkage of cysteine residues in proteins to form disulfide bonds that stabilize the folded structure. Disulfide bonds play multifaceted roles in the peptides and proteins that they occur in, from providing structural integrity to acting as allosteric switches that regulate function. Current knowledge on oxidative folding has been restricted to consensuses derived from observing the folding of various model peptides and proteins. Most notably, the oxidative folding pathways of the proteins bovine pancreatic trypsin inhibitor (BPTI) and hirudin have been used as extreme models based on the manner in which the disulfide bond formations occur. However, the folding of a large group of such disulfide-rich peptides and proteins has been vaguely described to fall in between these extreme models. In this study, conotoxins, a class of venom peptides derived from marine cone snails of the genus Conus are used as candidates to study their oxidative folding, and to determine their place between the aforementioned extremes defined by BPTI and hirudin. With their ability to potently and selectively block voltage-gated sodium channels, conotoxins invoke broader a pharmaceutical interest than being mere model peptides to study oxidative folding. Furthermore, disulfide isomers of tridegin, a 66mer peptide produced by the giant Amazon leech Haementeria ghilianii are investigated for the role of disulfide bonds concerning folding, stability and function. The pharmaceutical significance of tridegin is that, it is the only known peptide inhibitor of the blood coagulation factor XIIIA, and shows great promise as a lead substance in anti-coagulation therapy.
<br />
Heme being an effector molecule, conveys a regulatory effect on the proteins it binds to, affecting their physiological functions. Protein-ligand interactions in the form of the transient binding of heme to proteins is investigated herein using molecular docking and molecular dynamic simulations. Finally, a decade’s worth of experimental knowledge obtained on transient heme-protein interactions is presented as an algorithmic implementation to predict transient heme-binding motifs in protein sequences, enabling the identification of novel heme-regulated proteins. Overall, this work serves as a testament to the growing significance of in silico methods in aiding experimental biochemical research....
Crafting digital doubles: Enhancing shape acquisition and material representation
(2020-08-12)
The demand for digital applications follows the unhindered growth of the digitization of tasks across many fields, including the entertainment industry, education and science but
also work environments. ...