Scalable APRIORI-based frequent pattern discovery

Chester, Sean

Scalable APRIORI-based frequent pattern discovery

dc.contributor.author	Chester, Sean
dc.contributor.supervisor	Thomo, Alex
dc.date.accessioned	2009-04-28T17:44:31Z
dc.date.available	2009-04-28T17:44:31Z
dc.date.copyright	2009	en
dc.date.issued	2009-04-28T17:44:31Z
dc.degree.department	Department of Computer Science
dc.degree.level	Master of Science M.Sc.	en
dc.description.abstract	Frequent itemset mining, the task of finding sets of items that frequently occur to- gether in a dataset, has been at the core of the field of data mining for the past sixteen years. In that time, the size of datasets has grown much faster than has the ability of existing algorithms to handle those datasets. Consequentely, improvements are needed. In this thesis, we take the classic algorithm for the problem, A Priori, and improve it quite significantly by introducing what we call a vertical sort. We then use the benchmark large dataset, webdocs, from the FIMI 2004 conference to contrast our performance against several state-of-the-art implementations and demonstrate not only equal efficiency with lower memory usage at all support thresholds, but also the ability to mine support thresholds as yet unattempted in literature. We also indicate how we believe this work can be extended to achieve yet more impressive results.	en
dc.identifier.uri	http://hdl.handle.net/1828/1370
dc.language	English	eng
dc.language.iso	en	en
dc.rights	Available to the World Wide Web	en
dc.subject	data mining	en
dc.subject	apriori	en
dc.subject	frequent itemset mining	en
dc.subject	machine learning	en
dc.subject.lcsh	UVic Subject Index::Sciences and Engineering::Applied Sciences::Computer science	en
dc.title	Scalable APRIORI-based frequent pattern discovery	en
dc.type	Thesis	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SeanThesisFinal.pdf
Size:: 379.04 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.83 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)