Enhancing relational databases with semantic search using word embeddings

dc.contributor.authorPeng, Gerry
dc.date.accessioned2026-04-22T16:37:13Z
dc.date.available2026-04-22T16:37:13Z
dc.date.issued2026
dc.description.abstractRelational databases can store and query structured data, but searching text is mostly limited to exact keyword matching. As a result, it can be difficult to retrieve conceptually related entries when different wording is used. This project explores how word embeddings represent semantic meaning in text and how they can be integrated into a relational database to support meaning based search. The project began by analyzing word embeddings to examine how semantic structure is captured in embedding spaces. Pretrained vector embeddings are compared with custom embeddings trained on a dataset of tweets related to the 2016 American election. Nearest neighbour analysis and vector arithmetic are used to observe how training data size and bias affect the resulting embeddings. Additionally, word embeddings are integrated into a movie database. Movie titles and plot descriptions are represented using word embeddings, and similarity comparisons retrieve movies based on semantic relevance rather than keyword matches. The results show embeddings trained on more general text corpora have more comprehensive semantic relationships, while embeddings trained on niche text perform well within their domain. Overall, this project shows word embeddings provide a practical way to extend traditional database systems with search that factors in word meaning.
dc.description.reviewstatusReviewed
dc.description.scholarlevelUndergraduate
dc.description.sponsorshipJamie Cassels Undergraduate Research Awards (JCURA)
dc.identifier.urihttps://hdl.handle.net/1828/23678
dc.language.isoen
dc.publisherUniversity of Victoria
dc.subjectembedding
dc.subjectdatabase
dc.subjectsimilarity
dc.subjectretrieval
dc.subjectsemantic
dc.subjectNLP
dc.subjectJamie Cassels Undergraduate Research Awards (JCURA)
dc.subject.departmentDepartment of Computer Science
dc.titleEnhancing relational databases with semantic search using word embeddings
dc.typePoster

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
peng_gerry_jcura_poster_2026.pdf
Size:
5.43 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: