Word embedding bias in large language models

Chuthamsatid, Poomrapee

Word embedding bias in large language models

Files

chuthamsatid_poomrapee_jcura_poster_2025.pdf (223.72 KB)

Date

2025

Authors

Chuthamsatid, Poomrapee

Publisher

University of Victoria

Abstract

This paper extends prior research on bias in word embeddings by addressing significant limitations in previous studies, such as Caliskan et al. (2017, 2022), which focused primarily on older models like GloVe and FastText and examined mainly gender bias. In contrast, our work investigates biases in modern large language models (LLMs), including OpenAI and Google embeddings, and expands the scope to both gender- and race-associated biases. We analyze biases across different word frequency ranges, using SC-WEAT tests, clustering, and t-SNE visualizations to uncover deeper insights into thematic clusters. Additionally, we explore how these biases are related to real-world sectors like the tech industry and higher education. By broadening the scope and applying more contemporary models, our research provides a more comprehensive understanding of bias in LLMs compared to earlier studies.

Keywords

bias, word embeddings, large language models, ethics in AI

URI

https://hdl.handle.net/1828/21729

Collections

Jamie Cassels Undergraduate Research Awards (JCURA)

Full item page

Word embedding bias in large language models

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections