Optimizing Index Structures to Support Semantic Queries in Relational Databases

Kamel, Victor

Optimizing Index Structures to Support Semantic Queries in Relational Databases

Files

Victor Kamel-JCURAposter-2023.pdf (1.24 MB)

Date

2023-03-20

Authors

Kamel, Victor

Abstract

Computers are not able to natively understand text. Thus, when text data is stored in a database, it is represented as “strings” of characters encoded using a standard such as ASCII or UTF-8. In this research, we explore the current methods used to manage string-based keys in relational databases, as well as to generate vector representations of strings that encode semantic meaning based on the entropy in a collection of training text, in order to enable semantic queries with string-based keys with little additional overhead cost. We consider the top-k query, where the k highest-ranking results are retrieved. Several candidate algorithms and their associated spatial index data structures are proposed in order to accelerate top-k queries that compare dimensionally reduced word embeddings based on cosine similarity. We introduce two spatial partitioning-based algorithms that improve on naive and optimized scan-based methods. Further, we implement and test these algorithms in order to evaluate their relative performance.

Keywords

Software Prototypes, Asymptotic Analysis, Word Embeddings, Index Structures, Natural Language Processing, Databases

URI

http://hdl.handle.net/1828/14914

Collections

Jamie Cassels Undergraduate Research Awards (JCURA)

Full item page

Optimizing Index Structures to Support Semantic Queries in Relational Databases

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections