Plohmann, Daniel Johannes: Classification, Characterization, and Contextualization of Windows Malware using Static Behavior and Similarity Analysis. - Bonn, 2022. - Dissertation, Rheinische Friedrich-Wilhelms-Universität Bonn.
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-67161
Online-Ausgabe in bonndoc: https://nbn-resolving.org/urn:nbn:de:hbz:5-67161
@phdthesis{handle:20.500.11811/9992,
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-67161,
author = {{Daniel Johannes Plohmann}},
title = {Classification, Characterization, and Contextualization of Windows Malware using Static Behavior and Similarity Analysis},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2022,
month = jul,
note = {In this dissertation, we provide a comprehensive malware ground truth data set called Malpedia that is tailored towards demands that serve both academic research and practical malware analysis. Using Malpedia as a foundation, we demonstrate its value by improving existing and proposing new methods for static analysis of Windows malware that focus on the tasks of classification, characterization, and contextualization. The contributions of this thesis are organized in three parts as follows.
First, we define a set of key requirements structured around the aspects of representativeness, accessibility, and practicality that malware data sets should respect. We follow up with a detailed presentation of our experiences with designing and creating Malpedia, our proposal for a reference implementation of such a data set. Using Malpedia, we perform a comparative structural analysis of file integrity and meta data properties for unpacked samples of 839 Windows malware families.
In the second part of the thesis, we concentrate on malware behavior and study interactions of malware with the Windows API (WinAPI) in detail. We begin with the introduction of ApiScout as a method for reliable extraction of WinAPI usage information from memory dumps. The application of ApiScout on Malpedia shows that dynamic API imports are widely used in malware. We continue with a frequency analysis of individual WinAPI usage and observe that usage profiles seem characteristic for malware families. We propose ApiVectors as method for representation and comparison of WinAPI usage profiles and show that they can be successfully applied for malware classification.
The final part of the thesis focuses on code disassembly and code similarity. We first show that the accuracy of established disassemblers suffers significantly when facing headerless or already mapped images containing code and propose SMDA, a method that produces high-quality disassembly without relying on structural information. Addressing code similarity, we introduce MCRIT as a MinHash-based method using token- and metrics-based features for efficient one-to-many fuzzy code comparisons. We prove the effectiveness of the individual features and MCRIT extensively and then continue to apply it on Malpedia. As a result, we show that third-party library code on average constitutes 15-20% of binary content in malware and that sharing of intrinsic code across malware families appears to not be common except for cases of source code leaks or otherwise established family relationships.},
url = {https://hdl.handle.net/20.500.11811/9992}
}
urn: https://nbn-resolving.org/urn:nbn:de:hbz:5-67161,
author = {{Daniel Johannes Plohmann}},
title = {Classification, Characterization, and Contextualization of Windows Malware using Static Behavior and Similarity Analysis},
school = {Rheinische Friedrich-Wilhelms-Universität Bonn},
year = 2022,
month = jul,
note = {In this dissertation, we provide a comprehensive malware ground truth data set called Malpedia that is tailored towards demands that serve both academic research and practical malware analysis. Using Malpedia as a foundation, we demonstrate its value by improving existing and proposing new methods for static analysis of Windows malware that focus on the tasks of classification, characterization, and contextualization. The contributions of this thesis are organized in three parts as follows.
First, we define a set of key requirements structured around the aspects of representativeness, accessibility, and practicality that malware data sets should respect. We follow up with a detailed presentation of our experiences with designing and creating Malpedia, our proposal for a reference implementation of such a data set. Using Malpedia, we perform a comparative structural analysis of file integrity and meta data properties for unpacked samples of 839 Windows malware families.
In the second part of the thesis, we concentrate on malware behavior and study interactions of malware with the Windows API (WinAPI) in detail. We begin with the introduction of ApiScout as a method for reliable extraction of WinAPI usage information from memory dumps. The application of ApiScout on Malpedia shows that dynamic API imports are widely used in malware. We continue with a frequency analysis of individual WinAPI usage and observe that usage profiles seem characteristic for malware families. We propose ApiVectors as method for representation and comparison of WinAPI usage profiles and show that they can be successfully applied for malware classification.
The final part of the thesis focuses on code disassembly and code similarity. We first show that the accuracy of established disassemblers suffers significantly when facing headerless or already mapped images containing code and propose SMDA, a method that produces high-quality disassembly without relying on structural information. Addressing code similarity, we introduce MCRIT as a MinHash-based method using token- and metrics-based features for efficient one-to-many fuzzy code comparisons. We prove the effectiveness of the individual features and MCRIT extensively and then continue to apply it on Malpedia. As a result, we show that third-party library code on average constitutes 15-20% of binary content in malware and that sharing of intrinsic code across malware families appears to not be common except for cases of source code leaks or otherwise established family relationships.},
url = {https://hdl.handle.net/20.500.11811/9992}
}