Vulnerability Detection in Assembly Code Using Deep Learning

Date

2023-03-01

Authors

Thangavelu, Karthiga

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Language modelling for source code is a state-of-the-art method which is developing significantly in recent years. Its applications are found in code completion, translating programming languages from one to another, translating text documents to code, finding vulnerabilities in source code, etc. Unlike other source code modelling such as C, C++ or Python, modelling assembly language is a tedious process. Most of the approaches involved in feature engineering are manual in assembly code. In this project, the pattern of assembly code is recognized, and malicious code is classified from non-malicious code. The strings of jumps are introduced into the assembly code to make it non-malicious. The pattern recognition and classification process consist of 3 main tasks. Firstly, the strings of jumps are introduced to the assembly code and tokenize the assembly code. Secondly, converting instructions to vectors using assembly language model for instruction embedding based on BERT language transformer, which minimizes the manual process of dataset pre-processing. The final task is a downstream task where the instruction embeddings are fed into the LSTM network for classifying malicious code from non-malicious code using an assembly code dataset. The performance of the model is evaluated using various evaluation metrics such as accuracy, confusion matrix, recall, precision, and F1 score.

Description

Keywords

Vulnerability detection, Assembly code, Transformer based-model, Instruction embedding

Citation