Enhancing cybersecurity text classification via AMR based augmentation and drift simulation with reinforcement learning

Date

2025

Authors

Ahmed, Hadeer

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Natural language processing (NLP) is increasingly applied to cybersecurity text classification, but two challenges limit its effectiveness. First, access to high-quality labeled cybersecurity text is limited because organizations rarely share sensitive incident reports or vulnerability descriptions. Second, the rapid evolution of cyber threats leads to a decline in model accuracy as novel attack types and information emerge. Current text augmentation methods only make small surface-level edits and often fail to keep important domain-specific terminology intact. Existing drift handling techniques also fall short, as they rely on generic strategies that do not capture how cybersecurity text evolves over time. They also lack transparency, making them difficult to trust in security-sensitive domains. This dissertation introduces two frameworks to address these limitations. The first, AMR-CLONALG, combines Abstract Meaning Representation (AMR) graphs with a clonal selection algorithm to generate text samples that preserve semantic meaning while introducing controlled variation in syntax and vocabulary. This enables the expansion of small datasets without compromising accuracy. The second framework, Drift-RL, leverages reinforcement learning to simulate different patterns of data drift, including sudden, gradual, incremental, and recurring. This supports a systematic evaluation of the robustness of the model under changing data distributions and provides a benchmark for studying the effects of drift. Together, these frameworks strengthen cybersecurity text classification by improving performance in low-resource settings and enabling rigorous testing of resilience against changing data. Both emphasize transparency, ensuring that their outputs remain interpretable and accountable in security-critical applications.

Description

Keywords

Cybersecruity, Drift, Augmentation, Text data

Citation