Enhancing relational databases with semantic search using word embeddings
Date
2026
Authors
Peng, Gerry
Journal Title
Journal ISSN
Volume Title
Publisher
University of Victoria
Abstract
Relational databases can store and query structured data, but searching text is mostly limited to exact keyword matching. As a result, it can be difficult to retrieve conceptually related entries when different wording is used. This project explores how word embeddings represent semantic meaning in text and how they can be integrated into a relational database to support meaning based search.
The project began by analyzing word embeddings to examine how semantic structure is captured in embedding spaces. Pretrained vector embeddings are compared with custom embeddings trained on a dataset of tweets related to the 2016 American election. Nearest neighbour analysis and vector arithmetic are used to observe how training data size and bias affect the resulting embeddings.
Additionally, word embeddings are integrated into a movie database. Movie titles and plot descriptions are represented using word embeddings, and similarity comparisons retrieve movies based on semantic relevance rather than keyword matches. The results show embeddings trained on more general text corpora have more comprehensive semantic relationships, while embeddings trained on niche text perform well within their domain. Overall, this project shows word embeddings provide a practical way to extend traditional database systems with search that factors in word meaning.
Description
Keywords
embedding, database, similarity, retrieval, semantic, NLP, Jamie Cassels Undergraduate Research Awards (JCURA)