Word embedding bias in large language models
dc.contributor.author | Chuthamsatid, Poomrapee | |
dc.date.accessioned | 2025-04-01T18:43:29Z | |
dc.date.available | 2025-04-01T18:43:29Z | |
dc.date.issued | 2025 | |
dc.description.abstract | This paper extends prior research on bias in word embeddings by addressing significant limitations in previous studies, such as Caliskan et al. (2017, 2022), which focused primarily on older models like GloVe and FastText and examined mainly gender bias. In contrast, our work investigates biases in modern large language models (LLMs), including OpenAI and Google embeddings, and expands the scope to both gender- and race-associated biases. We analyze biases across different word frequency ranges, using SC-WEAT tests, clustering, and t-SNE visualizations to uncover deeper insights into thematic clusters. Additionally, we explore how these biases are related to real-world sectors like the tech industry and higher education. By broadening the scope and applying more contemporary models, our research provides a more comprehensive understanding of bias in LLMs compared to earlier studies. | |
dc.description.reviewstatus | Unreviewed | |
dc.description.scholarlevel | Undergraduate | |
dc.description.sponsorship | Jamie Cassels Undergraduate Research Awards (JCURA) | |
dc.identifier.uri | https://hdl.handle.net/1828/21729 | |
dc.language.iso | en | |
dc.publisher | University of Victoria | |
dc.subject | bias | |
dc.subject | word embeddings | |
dc.subject | large language models | |
dc.subject | ethics in AI | |
dc.title | Word embedding bias in large language models | |
dc.type | Poster |