Word embedding bias in large language models

dc.contributor.authorChuthamsatid, Poomrapee
dc.date.accessioned2025-04-01T18:43:29Z
dc.date.available2025-04-01T18:43:29Z
dc.date.issued2025
dc.description.abstractThis paper extends prior research on bias in word embeddings by addressing significant limitations in previous studies, such as Caliskan et al. (2017, 2022), which focused primarily on older models like GloVe and FastText and examined mainly gender bias. In contrast, our work investigates biases in modern large language models (LLMs), including OpenAI and Google embeddings, and expands the scope to both gender- and race-associated biases. We analyze biases across different word frequency ranges, using SC-WEAT tests, clustering, and t-SNE visualizations to uncover deeper insights into thematic clusters. Additionally, we explore how these biases are related to real-world sectors like the tech industry and higher education. By broadening the scope and applying more contemporary models, our research provides a more comprehensive understanding of bias in LLMs compared to earlier studies.
dc.description.reviewstatusUnreviewed
dc.description.scholarlevelUndergraduate
dc.description.sponsorshipJamie Cassels Undergraduate Research Awards (JCURA)
dc.identifier.urihttps://hdl.handle.net/1828/21729
dc.language.isoen
dc.publisherUniversity of Victoria
dc.subjectbias
dc.subjectword embeddings
dc.subjectlarge language models
dc.subjectethics in AI
dc.titleWord embedding bias in large language models
dc.typePoster

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
chuthamsatid_poomrapee_jcura_poster_2025.pdf
Size:
223.72 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: