Vulnerability Detection in Assembly Code Using Deep Learning

dc.contributor.authorThangavelu, Karthiga
dc.contributor.supervisorSima, Mihai
dc.date.accessioned2023-03-02T00:49:06Z
dc.date.available2023-03-02T00:49:06Z
dc.date.copyright2022en_US
dc.date.issued2023-03-01
dc.degree.departmentDepartment of Anthropologyen_US
dc.degree.levelMaster of Engineering M.Eng.en_US
dc.description.abstractLanguage modelling for source code is a state-of-the-art method which is developing significantly in recent years. Its applications are found in code completion, translating programming languages from one to another, translating text documents to code, finding vulnerabilities in source code, etc. Unlike other source code modelling such as C, C++ or Python, modelling assembly language is a tedious process. Most of the approaches involved in feature engineering are manual in assembly code. In this project, the pattern of assembly code is recognized, and malicious code is classified from non-malicious code. The strings of jumps are introduced into the assembly code to make it non-malicious. The pattern recognition and classification process consist of 3 main tasks. Firstly, the strings of jumps are introduced to the assembly code and tokenize the assembly code. Secondly, converting instructions to vectors using assembly language model for instruction embedding based on BERT language transformer, which minimizes the manual process of dataset pre-processing. The final task is a downstream task where the instruction embeddings are fed into the LSTM network for classifying malicious code from non-malicious code using an assembly code dataset. The performance of the model is evaluated using various evaluation metrics such as accuracy, confusion matrix, recall, precision, and F1 score.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/14805
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectVulnerability detectionen_US
dc.subjectAssembly codeen_US
dc.subjectTransformer based-modelen_US
dc.subjectInstruction embeddingen_US
dc.titleVulnerability Detection in Assembly Code Using Deep Learningen_US
dc.typeprojecten_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Thangavelu_Karthiga_MEng_2023.pdf
Size:
932.57 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: