Optimizing Index Structures to Support Semantic Queries in Relational Databases

dc.contributor.authorKamel, Victor
dc.date.accessioned2023-03-20T22:19:27Z
dc.date.available2023-03-20T22:19:27Z
dc.date.copyright2023en_US
dc.date.issued2023-03-20
dc.description.abstractComputers are not able to natively understand text. Thus, when text data is stored in a database, it is represented as “strings” of characters encoded using a standard such as ASCII or UTF-8. In this research, we explore the current methods used to manage string-based keys in relational databases, as well as to generate vector representations of strings that encode semantic meaning based on the entropy in a collection of training text, in order to enable semantic queries with string-based keys with little additional overhead cost. We consider the top-k query, where the k highest-ranking results are retrieved. Several candidate algorithms and their associated spatial index data structures are proposed in order to accelerate top-k queries that compare dimensionally reduced word embeddings based on cosine similarity. We introduce two spatial partitioning-based algorithms that improve on naive and optimized scan-based methods. Further, we implement and test these algorithms in order to evaluate their relative performance.en_US
dc.description.reviewstatusRevieweden_US
dc.description.scholarlevelUndergraduateen_US
dc.description.sponsorshipJamie Cassels Undergraduate Research Awards (JCURA)en_US
dc.identifier.urihttp://hdl.handle.net/1828/14914
dc.language.isoenen_US
dc.subjectSoftware Prototypesen_US
dc.subjectAsymptotic Analysisen_US
dc.subjectWord Embeddingsen_US
dc.subjectIndex Structuresen_US
dc.subjectNatural Language Processingen_US
dc.subjectDatabasesen_US
dc.titleOptimizing Index Structures to Support Semantic Queries in Relational Databasesen_US
dc.typePosteren_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Victor Kamel-JCURAposter-2023.pdf
Size:
1.24 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2 KB
Format:
Item-specific license agreed upon to submission
Description: