Representative Subsets for Preference Queries

dc.contributor.authorChester, Sean
dc.contributor.supervisorThomo, Alex
dc.contributor.supervisorSrinivasan, Venkatesh
dc.contributor.supervisorWhitesides, Sue H.
dc.date.accessioned2013-08-26T17:50:39Z
dc.date.available2013-08-26T17:50:39Z
dc.date.copyright2013en_US
dc.date.issued2013-08-26
dc.degree.departmentDepartment of Computer Science
dc.degree.levelDoctor of Philosophy Ph.D.en_US
dc.description.abstractWe focus on the two overlapping areas of preference queries and dataset summarization. A (linear) preference query specifies the relative importance of the attributes in a dataset and asks for the tuples that best match those preferences. Dataset summarization is the task of representing an entire dataset by a small, representative subset. Within these areas, we focus on three important sub-problems, significantly advancing the state-of-the-art in each. We begin with an investigation into a new formulation of preference queries, identifying a neglected and important subclass that we call threshold projection queries. While literature typically constrains the attribute preferences (which are real-valued weights) such that their sum is one, we show that this introduces bias when querying by threshold rather than cardinality. Using projection, rather than inner product as in that literature, removes the bias. We then give algorithms for building and querying indices for this class of query, based, in the general case, on geometric duality and halfspace range searching, and, in an important special case, on stereographic projection. In the second part of the dissertation, we investigate the monochromatic reverse top-k (mRTOP) query in two dimensions. A mRTOP query asks for, given a tuple and a dataset, the linear preference queries on the dataset that will include the given tuple. Towards this goal, we consider the novel scenario of building an index to support mRTOP queries, using geometric duality and plane sweep. We show theoretically and empirically that the index is quick to build, small on disk, and very efficient at answering mRTOP queries. As a corollary to these efforts, we defined the top-k rank contour, which encodes the k-ranked tuple for every possible linear preference query. This is tremendously useful in answering mRTOP queries, but also, we posit, of significant independent interest for its relation to myriad related linear preference query problems. Intuitively, the top-k rank contour is the minimum possible representation of knowledge needed to identify the k-ranked tuple for any query, without apriori knowledge of that query. We also introduce k-regret minimizing sets, a very succinct approximation of a numeric dataset. The purpose of the approximation is to represent the entire dataset by just a small subset that nonetheless will contain a tuple within or near to the top-k for any linear preference query. We show that the problem of finding k-regret minimizing sets—and, indeed, the problem in literature that it generalizes—is NP-Hard. Still, for the special case of two dimensions, we provide a fast, exact algorithm based on the top-k rank contour. For arbitrary dimension, we introduce a novel greedy algorithm based on linear programming and randomization that does excellently in our empirical investigation.en_US
dc.description.proquestcode0984en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/4833
dc.languageEnglisheng
dc.language.isoenen_US
dc.rights.tempAvailable to the World Wide Weben_US
dc.subjectdatabasesen_US
dc.subjectcomputational geometryen_US
dc.subjecttop-k queriesen_US
dc.subjectpreference queriesen_US
dc.subjectk-regret minimizing setsen_US
dc.subjectdepth contoursen_US
dc.subjectindexingen_US
dc.subjectreverse data managementen_US
dc.subjectstereographic projectionen_US
dc.subjectplane sweepen_US
dc.subjectlinear programmingen_US
dc.subjectcomputational complexityen_US
dc.subjectalgorithmsen_US
dc.subjectNP-hardnessen_US
dc.subjectrandomizationen_US
dc.subjectsummarizationen_US
dc.subjectdualityen_US
dc.titleRepresentative Subsets for Preference Queriesen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chester_Sean_PhD_2013.pdf
Size:
1.31 MB
Format:
Adobe Portable Document Format
Description:
Full dissertation
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: