Optimizing Index Structures to Support Semantic Queries in Relational Databases
dc.contributor.author | Kamel, Victor | |
dc.date.accessioned | 2023-03-20T22:19:27Z | |
dc.date.available | 2023-03-20T22:19:27Z | |
dc.date.copyright | 2023 | en_US |
dc.date.issued | 2023-03-20 | |
dc.description.abstract | Computers are not able to natively understand text. Thus, when text data is stored in a database, it is represented as “strings” of characters encoded using a standard such as ASCII or UTF-8. In this research, we explore the current methods used to manage string-based keys in relational databases, as well as to generate vector representations of strings that encode semantic meaning based on the entropy in a collection of training text, in order to enable semantic queries with string-based keys with little additional overhead cost. We consider the top-k query, where the k highest-ranking results are retrieved. Several candidate algorithms and their associated spatial index data structures are proposed in order to accelerate top-k queries that compare dimensionally reduced word embeddings based on cosine similarity. We introduce two spatial partitioning-based algorithms that improve on naive and optimized scan-based methods. Further, we implement and test these algorithms in order to evaluate their relative performance. | en_US |
dc.description.reviewstatus | Reviewed | en_US |
dc.description.scholarlevel | Undergraduate | en_US |
dc.description.sponsorship | Jamie Cassels Undergraduate Research Awards (JCURA) | en_US |
dc.identifier.uri | http://hdl.handle.net/1828/14914 | |
dc.language.iso | en | en_US |
dc.subject | Software Prototypes | en_US |
dc.subject | Asymptotic Analysis | en_US |
dc.subject | Word Embeddings | en_US |
dc.subject | Index Structures | en_US |
dc.subject | Natural Language Processing | en_US |
dc.subject | Databases | en_US |
dc.title | Optimizing Index Structures to Support Semantic Queries in Relational Databases | en_US |
dc.type | Poster | en_US |