Privacy Preserving Data Mining using Unrealized Data Sets: Scope Expansion and Data Compression

Show simple item record

dc.contributor.author Fong, Pui Kuen
dc.date.accessioned 2013-05-16T18:31:39Z
dc.date.available 2013-07-14T11:22:02Z
dc.date.copyright 2013 en_US
dc.date.issued 2013-05-16
dc.identifier.uri http://hdl.handle.net/1828/4623
dc.description.abstract In previous research, the author developed a novel PPDM method – Data Unrealization – that preserves both data privacy and utility of discrete-value training samples. That method transforms original samples into unrealized ones and guarantees 100% accurate decision tree mining results. This dissertation extends their research scope and achieves the following accomplishments: (1) it expands the application of Data Unrealization on other data mining algorithms, (2) it introduces data compression methods that reduce storage requirements for unrealized training samples and increase data mining performance and (3) it adds a second-level privacy protection that works perfectly with Data Unrealization. From an application perspective, this dissertation proves that statistical information (i. e. counts, probability and information entropy) can be retrieved precisely from unrealized training samples, so that Data Unrealization is applicable for all counting-based, probability-based and entropy-based data mining models with 100% accuracy. For data compression, this dissertation introduces a new number sequence – J-Sequence – as a mean to compress training samples through the J-Sampling process. J-Sampling converts the samples into a list of numbers with many replications. Applying run-length encoding on the resulting list can further compress the samples into a constant storage space regardless of the sample size. In this way, the storage requirement of the sample database becomes O(1) and the time complexity of a statistical database query becomes O(1). J-Sampling is used as an encryption approach to the unrealized samples already protected by Data Unrealization; meanwhile, data mining can be performed on these samples without decryption. In order to retain privacy preservation and to handle data compression internally, a column-oriented database management system is recommended to store the encrypted samples. en_US
dc.language English eng
dc.language.iso en en_US
dc.subject Data Mining en_US
dc.subject Data Privacy en_US
dc.subject PPDM en_US
dc.subject Data Compression en_US
dc.subject Database en_US
dc.subject Set Theory en_US
dc.subject Number Sequence en_US
dc.subject Query Optimization en_US
dc.title Privacy Preserving Data Mining using Unrealized Data Sets: Scope Expansion and Data Compression en_US
dc.type Thesis en_US
dc.contributor.supervisor Jahnke, Jens H.
dc.contributor.supervisor Thomo, Alex
dc.degree.department Dept. of Computer Science en_US
dc.degree.level Doctor of Philosophy Ph.D. en_US
dc.rights.temp Available to the World Wide Web en_US
dc.description.scholarlevel Graduate en_US
dc.description.proquestcode 0984 en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UVicSpace


My Account