Wang, Yuan: Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies. - Bonn, 2009. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5N-19490
@phdthesis{handle:20.500.11811/4158,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5N-19490,
author = {{Yuan Wang}},
title = {Molecular Complexity Effects and Fingerprint-Based Similarity Search Strategies},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2009,
month = nov,

note = {Molecular fingerprints are bit string representations of molecular structure and properties. They are among the most popular descriptors and tools in molecular similarity searching because of their conceptual simplicity and computational efficiency. In order to calculate molecular similarity, fingerprints are computed for reference and screening database compounds and their bit settings are quantitatively compared using similarity metrics. One caveat of this approach is the bias caused by complexity effects: complex molecules have higher fingerprint bit density and produce artificially high similarity values.
The asymmetric behavior of Tversky similarity measurement has been reported: comparing A to B is not equal to comparing B to A. This phenomenon can be directly attributed to complexity effects. Hence, preference of parametric settings for Tversky coefficient is determined with regard to the relative difference of molecular complexity. One approach to avoid such effects is using fingerprint representations having constant bit density. Alternatively, emphasizing the absence of bit position features, which is not recorded using conventional fingerprint similarity search methods, provides another approach to address complexity effects. However, in order to optimize search performance, elimination of complexity effects using this approach is not as effective as modulation of complexity effects. In order to evaluate the outcome of virtual screening, search performance is monitored for combinations of different parameters. In general, in similarity searching using highly complex reference compounds it is difficult to recover potential hits that are less complex.
To further investigate complexity effects, the random reduction of fingerprint bit density is also explored. The ensuing loss of chemical information can be compensated for by balancing complexity effects when the fingerprints of reference compounds are modified to reduce their bit density.
When this random process is replaced with iterative bit silencing, the significance of each bit position in similarity searching can be analyzed and different weights can be assigned to each position. Such a weighting scheme emphasizes critical bit positions specific to the reference activity class. Class-specific similarity metrics can be derived by utilizing these weights in similarity calculation. Using these similarity metrics similarity search performance can be improved, especially when conventional methods fail to retrieve potential active compounds.
Information of reference sets can also be directly utilized in the form of Shannon entropy as a measure of similarity. This simple and efficient similarity search strategy assesses the fingerprint entropy penalty induced by introducing external molecules into the reference set. It has comparable or better performance compared to nearest neighbor approaches but lower computational costs.},

url = {https://hdl.handle.net/20.500.11811/4158}
}

The following license files are associated with this item:

InCopyright