New method for learning decision trees from rules and its illustration for online identity application fraud detection

Date

2010-11-10T19:52:12Z

Authors

Abdelhalim, Amany

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

A decision tree is a graph or model for representing all the alternatives in a decision making process. Most of the methods that generate decision trees for a specific problem use examples of data instances in the decision tree generation process. We propose a new method called "RBDT-1"- rule based decision tree -for learning a decision tree from a set of decision rules that cover the data instances. RBDT-l method uses a set of declarative rules as an input for generating a decision tree. The method's goal is to create on-demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules used by RBDT-1 could be generated either by an expert or induced directly from a rule induction method or indirectly by extracting them from a decision tree. We conduct a comparative study of RBDT-1 with four existing decision tree methods based on different problems. The outcome of the study shows that in terms of tree complexity (number of nodes and leaves in the decision tree) RBDT-1 compares favorably to AQDT-1 and AQDT-2 which are methods that create decision trees from rules. RBDT-1 compares favorably also to ID3 while is as effective as C4.5 where both (ID3 and C4.5) are famous methods that generate decision trees from data examples. Experiments show that the classification accuracies of the different decision trees produced by the different methods under comparison are equal. To illustrate how RBDT-1 can successfully be applied to an existing real life problem that could benefit from the method, we choose identity application fraud detection. We designed a new unsupervised framework to detect fraudulent applications for identity certificates by extracting identity patterns from the web, and crossing these patterns with information contained in the application forms in order to detect inconsistencies or anomalies. The outcome of this process is submitted to a decision tree classifier generated using RBDT-1 on the fly from a rule base which is derived from heuristics and expert knowledge, and updated as more information are obtained on fraudulent behavior. We evaluate the proposed framework by collecting real identity information online and generating synthetic fraud cases, achieving encouraging performance results.

Description

Keywords

identity theft prevention, computer crimes prevention, machine learning

Citation