Petersen, Malte: Comparative analysis of the insect mobile genetic element repertoire and its influence on genome size dynamics. - Bonn, 2020. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-56226
@phdthesis{handle:20.500.11811/8404,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-56226,
author = {{Malte Petersen}},
title = {Comparative analysis of the insect mobile genetic element repertoire and its influence on genome size dynamics},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2020,
month = jun,

note = {This thesis presents comparative genomics studies in insects as well as bioinformatics software development. Its empirical research part is focused mainly on mobile genetic elements, also termed transposable elements. The data basis contains datasets from public repositories, a rich and often underexplored source of information on genomic biodiversity. Transposable elements in particular are often neglected when the results of a genome sequencing study are published, although they make up a major part of virtually every eukaryotic genome.
After a general introduction in Chapter 1, I characterize and compare the transposable element repertoire of 73 arthropod species in Chapter 2 and find that it correlates to genome size in both abundance and diversity. In Chapter 3, I study the effect of transposable elements on the evolution of genome size in more detail and on an expanded dataset of 96 species. In Chapter 4, I present a software pipeline for delineating orthology among coding nucleotide sequences, an essential tool for many comparative and phylogenetic studies. Finally, Chapter 5 is a general conclusion.
Chapter 2: Transposable elements (TEs) are a major component of metazoan genomes and are associated with a variety of mechanisms that shape genome architecture and evolution. Despite the ever-growing number of insect genomes sequenced to date, our understanding of the diversity and evolution of insect TEs remains poor. Here, we present a standardized characterization and an order-level comparison of arthropod TE repertoires, encompassing 62 insect and 11 outgroup species. The insect TE repertoire contains TEs of almost every class previously described, and in some cases even TEs previously reported only from vertebrates and plants. Additionally, we identified a large fraction of unclassifiable TEs. We found high variation in TE content, ranging from less than 6 % in the antarctic midge (Diptera), the honey bee and the turnip sawfly (Hymenoptera) to more than 58 % in the malaria mosquito (Diptera) and the migratory locust (Orthoptera), and a possible relationship between the content and diversity of TEs and the genome size. While most insect orders exhibit a characteristic TE composition, we also observed intraordinal differences, e.g., in Diptera, Hymenoptera, and Hemiptera. Our findings shed light on common patterns and reveal lineage-specific differences in content and evolution of TEs in insects. We anticipate our study to provide the basis for future comparative research on the insect TE repertoire.
Chapter 3: Genome size in insects displays inter-specific variation in excess of 130-fold, a range only paralleled in the metazoan phylum by amphibians. In general, these inter-specific differences seem to be best explained by differential rates of transposable element (TE) accumulation. In fact, we observe that TE accumulation rates are lineage-specific and that major insect clades have distinct TE age distributions. Given this observation, we hypothesize that evolutionarily younger insect lineages should have more TEs that are older than the insect lineage itself. To test this hypothesis, we infer ancient and lineage specific TE insertions, and quantify genome size increase and decrease in 96 arthropod species from 18 major insect orders, spanning a geological age range of around 400 million years. Our analysis reveals that most insect lineages appear to have a specific rate of TE accumulation that is correlated with genome size, along with a distinct, cladespecific and TE class dependent TE age distribution. Additionally, lineage-specific rates of genome size reduction appear to counteract genome expansion through TE activity. Our results are inconsistent with a general "accordion" model of genome size dynamics in eukaryotes, therefore we suggest that TE management in insects is fundamentally different than in vertebrates. We propose that in the face of burst-like TE proliferation events, clade-specific rates of genome size reduction strongly influence the large variation in extant insect genome sizes.
Chapter 4: Orthology characterizes genes of different organisms that arose from a single ancestral gene via speciation, in contrast to paralogy, which is assigned to genes that arose via gene duplication. An accurate orthology assignment is a crucial step for comparative genomic studies. Orthologous genes in two organisms can be identified by applying a so-called reciprocal search strategy, given that complete information of the organisms' gene repertoire is available. In many investigations, however, only a fraction of the gene content of the organisms under study is examined (e.g., RNA sequencing). Here, identification of orthologous nucleotide or amino acid sequences can be achieved using a graph-based approach that maps nucleotide sequences to genes of known orthology. Existing implementations of this approach, however, suffer from algorithmic issues that may cause problems in downstream analyses.
We present a new software pipeline, Orthograph, that addresses and solves the above problems and implements useful features for a wide range of comparative genomic and transcriptomic analyses. Orthograph applies a best reciprocal hit search strategy using profile hidden Markov models and maps nucleotide sequences to the globally best matching cluster of orthologous genes, thus enabling researchers to conveniently and reliably delineate orthologs and paralogs from transcriptomic and genomic sequence data. We demonstrate the performance of our approach on de novo-sequenced and assembled transcript libraries of 24 species of apoid wasps (Hymenoptera: Aculeata) as well as on published genomic datasets.
With Orthograph, we implemented a best reciprocal hit approach to reference-based orthology prediction for coding nucleotide sequences such as RNAseq data. Orthograph is flexible, easy to use, open source and freely available at https://mptrsen.github.io/Orthograph. Additionally, we release 24 de novo-sequenced and assembled transcript libraries of apoid wasp species.},

url = {http://hdl.handle.net/20.500.11811/8404}
}

The following license files are associated with this item:

Namensnennung - Nicht-kommerziell - Weitergabe unter gleichen Bedingungen 4.0 International