Optimizing Index Structures to Support Semantic Queries in Relational Databases




Kamel, Victor

Journal Title

Journal ISSN

Volume Title



Computers are not able to natively understand text. Thus, when text data is stored in a database, it is represented as “strings” of characters encoded using a standard such as ASCII or UTF-8. In this research, we explore the current methods used to manage string-based keys in relational databases, as well as to generate vector representations of strings that encode semantic meaning based on the entropy in a collection of training text, in order to enable semantic queries with string-based keys with little additional overhead cost. We consider the top-k query, where the k highest-ranking results are retrieved. Several candidate algorithms and their associated spatial index data structures are proposed in order to accelerate top-k queries that compare dimensionally reduced word embeddings based on cosine similarity. We introduce two spatial partitioning-based algorithms that improve on naive and optimized scan-based methods. Further, we implement and test these algorithms in order to evaluate their relative performance.



Software Prototypes, Asymptotic Analysis, Word Embeddings, Index Structures, Natural Language Processing, Databases