Enhancing relational databases with semantic search using word embeddings

Date

2026

Authors

Peng, Gerry

Journal Title

Journal ISSN

Volume Title

Publisher

University of Victoria

Abstract

Relational databases can store and query structured data, but searching text is mostly limited to exact keyword matching. As a result, it can be difficult to retrieve conceptually related entries when different wording is used. This project explores how word embeddings represent semantic meaning in text and how they can be integrated into a relational database to support meaning based search. The project began by analyzing word embeddings to examine how semantic structure is captured in embedding spaces. Pretrained vector embeddings are compared with custom embeddings trained on a dataset of tweets related to the 2016 American election. Nearest neighbour analysis and vector arithmetic are used to observe how training data size and bias affect the resulting embeddings. Additionally, word embeddings are integrated into a movie database. Movie titles and plot descriptions are represented using word embeddings, and similarity comparisons retrieve movies based on semantic relevance rather than keyword matches. The results show embeddings trained on more general text corpora have more comprehensive semantic relationships, while embeddings trained on niche text perform well within their domain. Overall, this project shows word embeddings provide a practical way to extend traditional database systems with search that factors in word meaning.

Description

Keywords

embedding, database, similarity, retrieval, semantic, NLP, Jamie Cassels Undergraduate Research Awards (JCURA)

Citation