Privacy Preserving Data Mining using Unrealized Data Sets: Scope Expansion and Data Compression

dc.contributor.authorFong, Pui Kuen
dc.contributor.supervisorJahnke, Jens H.
dc.contributor.supervisorThomo, Alex
dc.date.accessioned2013-05-16T18:31:39Z
dc.date.available2013-07-14T11:22:02Z
dc.date.copyright2013en_US
dc.date.issued2013-05-16
dc.degree.departmentDepartment of Computer Science
dc.degree.levelDoctor of Philosophy Ph.D.en_US
dc.description.abstractIn previous research, the author developed a novel PPDM method – Data Unrealization – that preserves both data privacy and utility of discrete-value training samples. That method transforms original samples into unrealized ones and guarantees 100% accurate decision tree mining results. This dissertation extends their research scope and achieves the following accomplishments: (1) it expands the application of Data Unrealization on other data mining algorithms, (2) it introduces data compression methods that reduce storage requirements for unrealized training samples and increase data mining performance and (3) it adds a second-level privacy protection that works perfectly with Data Unrealization. From an application perspective, this dissertation proves that statistical information (i. e. counts, probability and information entropy) can be retrieved precisely from unrealized training samples, so that Data Unrealization is applicable for all counting-based, probability-based and entropy-based data mining models with 100% accuracy. For data compression, this dissertation introduces a new number sequence – J-Sequence – as a mean to compress training samples through the J-Sampling process. J-Sampling converts the samples into a list of numbers with many replications. Applying run-length encoding on the resulting list can further compress the samples into a constant storage space regardless of the sample size. In this way, the storage requirement of the sample database becomes O(1) and the time complexity of a statistical database query becomes O(1). J-Sampling is used as an encryption approach to the unrealized samples already protected by Data Unrealization; meanwhile, data mining can be performed on these samples without decryption. In order to retain privacy preservation and to handle data compression internally, a column-oriented database management system is recommended to store the encrypted samples.en_US
dc.description.proquestcode0984en_US
dc.description.proquestemailfong_bee@hotmail.comen_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/4623
dc.languageEnglisheng
dc.language.isoenen_US
dc.rights.tempAvailable to the World Wide Weben_US
dc.subjectData Miningen_US
dc.subjectData Privacyen_US
dc.subjectPPDMen_US
dc.subjectData Compressionen_US
dc.subjectDatabaseen_US
dc.subjectSet Theoryen_US
dc.subjectNumber Sequenceen_US
dc.subjectQuery Optimizationen_US
dc.titlePrivacy Preserving Data Mining using Unrealized Data Sets: Scope Expansion and Data Compressionen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Fong_PuiKuen_PhD_2013.pdf
Size:
1.52 MB
Format:
Adobe Portable Document Format
Description:
Dissertation
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: