Scalable APRIORI-based frequent pattern discovery

Chester, Sean

Scalable APRIORI-based frequent pattern discovery

Files

SeanThesisFinal.pdf (379.04 KB)

Date

2009-04-28T17:44:31Z

Authors

Chester, Sean

Abstract

Frequent itemset mining, the task of finding sets of items that frequently occur to- gether in a dataset, has been at the core of the field of data mining for the past sixteen years. In that time, the size of datasets has grown much faster than has the ability of existing algorithms to handle those datasets. Consequentely, improvements are needed. In this thesis, we take the classic algorithm for the problem, A Priori, and improve it quite significantly by introducing what we call a vertical sort. We then use the benchmark large dataset, webdocs, from the FIMI 2004 conference to contrast our performance against several state-of-the-art implementations and demonstrate not only equal efficiency with lower memory usage at all support thresholds, but also the ability to mine support thresholds as yet unattempted in literature. We also indicate how we believe this work can be extended to achieve yet more impressive results.

Keywords

data mining, apriori, frequent itemset mining, machine learning

URI

http://hdl.handle.net/1828/1370

Collections

Electronic Theses and Dissertations (ETD)

Full item page

Scalable APRIORI-based frequent pattern discovery

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections