Enhancing relational databases with semantic search using word embeddings
| dc.contributor.author | Peng, Gerry | |
| dc.date.accessioned | 2026-04-22T16:37:13Z | |
| dc.date.available | 2026-04-22T16:37:13Z | |
| dc.date.issued | 2026 | |
| dc.description.abstract | Relational databases can store and query structured data, but searching text is mostly limited to exact keyword matching. As a result, it can be difficult to retrieve conceptually related entries when different wording is used. This project explores how word embeddings represent semantic meaning in text and how they can be integrated into a relational database to support meaning based search. The project began by analyzing word embeddings to examine how semantic structure is captured in embedding spaces. Pretrained vector embeddings are compared with custom embeddings trained on a dataset of tweets related to the 2016 American election. Nearest neighbour analysis and vector arithmetic are used to observe how training data size and bias affect the resulting embeddings. Additionally, word embeddings are integrated into a movie database. Movie titles and plot descriptions are represented using word embeddings, and similarity comparisons retrieve movies based on semantic relevance rather than keyword matches. The results show embeddings trained on more general text corpora have more comprehensive semantic relationships, while embeddings trained on niche text perform well within their domain. Overall, this project shows word embeddings provide a practical way to extend traditional database systems with search that factors in word meaning. | |
| dc.description.reviewstatus | Reviewed | |
| dc.description.scholarlevel | Undergraduate | |
| dc.description.sponsorship | Jamie Cassels Undergraduate Research Awards (JCURA) | |
| dc.identifier.uri | https://hdl.handle.net/1828/23678 | |
| dc.language.iso | en | |
| dc.publisher | University of Victoria | |
| dc.subject | embedding | |
| dc.subject | database | |
| dc.subject | similarity | |
| dc.subject | retrieval | |
| dc.subject | semantic | |
| dc.subject | NLP | |
| dc.subject | Jamie Cassels Undergraduate Research Awards (JCURA) | |
| dc.subject.department | Department of Computer Science | |
| dc.title | Enhancing relational databases with semantic search using word embeddings | |
| dc.type | Poster |