Scalable APRIORI-based frequent pattern discovery

dc.contributor.authorChester, Sean
dc.contributor.supervisorThomo, Alex
dc.date.accessioned2009-04-28T17:44:31Z
dc.date.available2009-04-28T17:44:31Z
dc.date.copyright2009en
dc.date.issued2009-04-28T17:44:31Z
dc.degree.departmentDepartment of Computer Science
dc.degree.levelMaster of Science M.Sc.en
dc.description.abstractFrequent itemset mining, the task of finding sets of items that frequently occur to- gether in a dataset, has been at the core of the field of data mining for the past sixteen years. In that time, the size of datasets has grown much faster than has the ability of existing algorithms to handle those datasets. Consequentely, improvements are needed. In this thesis, we take the classic algorithm for the problem, A Priori, and improve it quite significantly by introducing what we call a vertical sort. We then use the benchmark large dataset, webdocs, from the FIMI 2004 conference to contrast our performance against several state-of-the-art implementations and demonstrate not only equal efficiency with lower memory usage at all support thresholds, but also the ability to mine support thresholds as yet unattempted in literature. We also indicate how we believe this work can be extended to achieve yet more impressive results.en
dc.identifier.urihttp://hdl.handle.net/1828/1370
dc.languageEnglisheng
dc.language.isoenen
dc.rightsAvailable to the World Wide Weben
dc.subjectdata miningen
dc.subjectapriorien
dc.subjectfrequent itemset miningen
dc.subjectmachine learningen
dc.subject.lcshUVic Subject Index::Sciences and Engineering::Applied Sciences::Computer scienceen
dc.titleScalable APRIORI-based frequent pattern discoveryen
dc.typeThesisen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SeanThesisFinal.pdf
Size:
379.04 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.83 KB
Format:
Item-specific license agreed upon to submission
Description: