A distributed approach to Frequent Itemset Mining at low support levels

dc.contributor.authorClark, Neal
dc.contributor.supervisorCoady, Yvonne
dc.date.accessioned2014-12-22T23:07:04Z
dc.date.available2014-12-22T23:07:04Z
dc.date.copyright2014en_US
dc.date.issued2014-12-22
dc.degree.departmentDepartment of Computer Science
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractFrequent Itemset Mining, the process of finding frequently co-occurring sets of items in a dataset, has been at the core of the field of data mining for the past 25 years. During this time the datasets have grown much faster than the algorithms capacity to process them. Great progress was made at optimizing this task on a single computer however, despite years of research, very little progress has been made on parallelizing this task. FPGrowth based algorithms have proven notoriously difficult to parallelize and Apriori has largely fallen out of favor with the research community. In this thesis we introduce a parallel, Apriori based, Frequent Itemset Mining algo- rithm capable of distributing computation across large commodity clusters. Our case study demonstrates that our algorithm can efficiently scale to hundreds of cores, on a standard Hadoop MapReduce cluster, and can improve executions times by at least an order of magnitude at the lowest support levels.en_US
dc.description.proquestcode0984en_US
dc.description.proquestcode0800en_US
dc.description.proquestemailnclark@uvic.caen_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/5803
dc.languageEnglisheng
dc.language.isoenen_US
dc.rights.tempAvailable to the World Wide Weben_US
dc.subjectApriorien_US
dc.subjectMapReduceen_US
dc.subjectFrequent Itemset Miningen_US
dc.subjectFPGrowthen_US
dc.subjectDistributeden_US
dc.subjectMachine Learningen_US
dc.subjectHadoopen_US
dc.titleA distributed approach to Frequent Itemset Mining at low support levelsen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Clark_Neal_MSc_2014.pdf
Size:
458.37 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: