Show simple item record

AHRD: Automatically Annotate Proteins with Human Readable Descriptions and Gene Ontology Terms

dc.contributor.advisorSchoof, Heiko
dc.contributor.authorBoecker, Florian
dc.date.accessioned2021-10-05T14:04:35Z
dc.date.available2021-10-05T14:04:35Z
dc.date.issued05.10.2021
dc.identifier.urihttps://hdl.handle.net/20.500.11811/9344
dc.description.abstractIn the postgenomic era it is impossible to annotate the majority of new proteins in any other way than with computational methods. Our tool AHRD automatically annotates proteins with human readable descriptions and Gene Ontology (GO) terms on a genomic scale. It does so by performing a lexical analysis modeled on the decision process of a human curator investigating the protein descriptions of homologous proteins found by sequence similarity.
The central questions of this thesis are how GO annotations can be accurately evaluated and how the annotation performance of AHRD can be increased.
To this end we firstly generated an unbiased ground truth set of high quality protein annotations with minimal redundancy. It contains many proteins that are difficult to annotate and thus facilitates contrasting annotation methods. Secondly, we implemented and tested three evaluation metrics for the congruence of GO term annotations. The third metric, which employs the structure of the Gene Ontology and the commonness of GO terms to determine the semantic similarity of GO annotations, is able to perform the most nuanced and consistent evaluation. In addition to a preexisting simulated annealing-based approach a genetic algorithm-based machine learning method was implemented to use the aforementioned evaluation metrics to optimize AHRD's input parameters. Although the genetic algorithm was only able to provide small improvements, they were statistically significant and parameter optimization proved to be necessary to achieve optimal annotation performance. In the style of the lexical analysis of candidate descriptions a new GO term-based analysis for candidate annotations was created. This was able to improve AHRD's GO annotation performance and also enabled the incorporation of new quality indicators such as GO term information content and annotation evidence codes which improved the performance further. It also facilitated the annotation with newly combined sets of GO terms instead of only fixed sets obtained from reference proteins. However, this approach proved to be not viable as it resulted in a significant regression of annotation performance. Using our evaluation method we were able to show that AHRD is able to predict description and GO annotations better and at a greater coverage than most of its competitors. Despite the fact that AHRD is tailored for the application to whole proteomes from all branches of life and for ease of use, in the CAFA3 challenge, a community-driven evaluation of GO annotation methods that often do not have these benefits, AHRD was able to show satisfactory results in most categories.
In conclusion, we demonstrated a reliable GO annotation evaluation method and used it to develop AHRD's GO annotation from an afterthought to a mature feature. We showed that AHRD is not only successful at the annotation of descriptions but also at GO terms, while staying applicable in whole genome projects.
en
dc.language.isoeng
dc.rightsIn Copyright
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/
dc.subjectProtein
dc.subjectFunktionsvorhersage
dc.subjectGenomik
dc.subjectProteomik
dc.subjectBioinformatik
dc.subjectFunction Prediction
dc.subjectGenomics
dc.subjectProteomics
dc.subjectBioinformatics
dc.subject.ddc004 Informatik
dc.subject.ddc500 Naturwissenschaften
dc.subject.ddc570 Biowissenschaften, Biologie
dc.titleAHRD: Automatically Annotate Proteins with Human Readable Descriptions and Gene Ontology Terms
dc.typeDissertation oder Habilitation
dc.publisher.nameUniversitäts- und Landesbibliothek Bonn
dc.publisher.locationBonn
dc.rights.accessRightsopenAccess
dc.identifier.urnhttps://nbn-resolving.org/urn:nbn:de:hbz:5-63141
ulbbn.pubtypeErstveröffentlichung
ulbbnediss.affiliation.nameRheinische Friedrich-Wilhelms-Universität Bonn
ulbbnediss.affiliation.locationBonn
ulbbnediss.thesis.levelDissertation
ulbbnediss.dissID6314
ulbbnediss.date.accepted14.07.2021
ulbbnediss.instituteLandwirtschaftliche Fakultät : Institut für Nutzpflanzenwissenschaften und Ressourcenschutz (INRES)
ulbbnediss.fakultaetMathematisch-Naturwissenschaftliche Fakultät
dc.contributor.coRefereeHofmann-Apitius, Martin
ulbbnediss.contributor.orcidhttps://orcid.org/0000-0002-0732-6914
ulbbnediss.contributor.gnd1246050447


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

The following license files are associated with this item:

InCopyright