Word embedding bias in large language models

Date

2025

Authors

Chuthamsatid, Poomrapee

Journal Title

Journal ISSN

Volume Title

Publisher

University of Victoria

Abstract

This paper extends prior research on bias in word embeddings by addressing significant limitations in previous studies, such as Caliskan et al. (2017, 2022), which focused primarily on older models like GloVe and FastText and examined mainly gender bias. In contrast, our work investigates biases in modern large language models (LLMs), including OpenAI and Google embeddings, and expands the scope to both gender- and race-associated biases. We analyze biases across different word frequency ranges, using SC-WEAT tests, clustering, and t-SNE visualizations to uncover deeper insights into thematic clusters. Additionally, we explore how these biases are related to real-world sectors like the tech industry and higher education. By broadening the scope and applying more contemporary models, our research provides a more comprehensive understanding of bias in LLMs compared to earlier studies.

Description

Keywords

bias, word embeddings, large language models, ethics in AI

Citation