Three ethical dimensions of AI: Fairness in social recommenders, bias detection in LLMs, and privacy in NLP

Potka, Shera

Three ethical dimensions of AI: Fairness in social recommenders, bias detection in LLMs, and privacy in NLP

dc.contributor.author	Potka, Shera
dc.contributor.supervisor	Thomo, Alex
dc.date.accessioned	2025-05-29T22:31:55Z
dc.date.available	2025-05-29T22:31:55Z
dc.date.issued	2025
dc.degree.department	Department of Computer Science
dc.degree.level	Doctor of Philosophy PhD
dc.description.abstract	This thesis investigates three foundational challenges in the development of responsible Artificial Intelligence (AI): fairness in social recommender systems, demographic bias in large language models (LLMs), and privacy-preserving techniques for Natural Language Processing (NLP). Though these problems differ in technical scope and application domain, they share a common thread: vector-based representations—embeddings of users, words, and tokens—fundamentally shape how AI systems behave, make decisions, and affect people. Across these three dimensions, this work introduces new methods for measuring, interpreting, and mitigating risk, offering solutions grounded in both empirical analysis and practical utility.The first part of the thesis (Chapter 2) examines fairness in algorithmic link recommendation, with a focus on how structural minority communities—groups defined by network topology rather than identity—are represented in evolving social graphs. Standard recommenders tend to amplify popular users, reinforcing visibility gaps over time. We propose MinWalk, a fairness-aware algorithm that improves minority visibility while maintaining network stability. Simulations on real-world networks show that fairness- and diversity- aware algorithms vary widely in long-term impact, and that MinWalk offers a balanced, effective solution. This work underscores the importance of evaluating fairness dynami- cally and provides tools for designing more inclusive recommendation systems. The second part (Chapters 3 and 4) turns to demographic bias in LLM behavior. We analyze gender and race associations in contextual embeddings from five leading models developed by OpenAI, Google, Microsoft, Cohere, and BGE. Using the SC-WEAT metric and clustering techniques, we show that stereotypical associations persist and are amplified in modern embeddings. We also examine how these biases appear in real-world applications, focusing on consumer product recommendations. Using prompt engineering and computational linguistics methods—including Marked Words, SVM classification, and distributional divergence—we find that LLMs generate demographically skewed suggestions that reinforce social stereotypes. These findings highlight the risks of bias in LLM outputs and offer concrete tools for auditing fairness in generative systems. The final part (Chapter 5) addresses privacy in NLP, where the challenge lies in re- moving sensitive information from text without damaging meaning or fluency. Existing approaches either prioritize privacy but degrade text quality, or preserve fluency at the cost of weaker guarantees. To address this, we propose CluSanT, a flexible framework that uses token clustering and controlled replacement mechanisms to balance privacy and utility. Unlike prior methods, CluSanT retains strong privacy protection while producing more natural, semantically faithful text. We evaluate it using a range of metrics—including coherence, grammar, and semantic similarity—showing that it consistently improves over baselines on a legal benchmark dataset. Our results demonstrate that text sanitization can be both effective and intelligible to human readers. Taken together, this thesis presents a unified perspective on ethical AI through the lens of embeddings. In social networks, language generation, and privacy-preserving NLP, vector representations are not neutral—they encode power dynamics, preferences, and access. By examining how these embeddings influence visibility, bias, and confidentiality, this work contributes both practical algorithms and conceptual frameworks for designing fair, inclusive, and trustworthy AI systems.
dc.description.scholarlevel	Graduate
dc.identifier.bibliographicCitation	Shera Potka, Isla Li, Jason Kepler, and Alex Thomo. Enhancing Structural Minority Visibility in Link Recommendations. MEDES 2024 (16th International Conference on Management of Digital EcoSystems).
dc.identifier.bibliographicCitation	Poomrapee Chuthamsatid, Shera Potka, and Alex Thomo. Word Embedding Bias in Large Language Models. I-SPAN 2025 (17th International Symposium on Pervasive Systems, Algorithms, and Networks).
dc.identifier.bibliographicCitation	Ke Xu, Shera Potka, and Alex Thomo. Gender and Race Bias in Consumer Product Recommendations by Large Language Models. AINA-2025 (39th International Conference on Advanced Information Networking and Applications).
dc.identifier.bibliographicCitation	Ahmed Musa Awon, Yun Lu, Shera Potka, and Alex Thomo. CluSanT: Differentially Private and Semantically Coherent Text Sanitization. NAACL 2025 (Annual Conference of the North American Chapter of the Association for Computational Linguistics).
dc.identifier.uri	https://hdl.handle.net/1828/22316
dc.language	English	eng
dc.language.iso	en
dc.rights	Available to the World Wide Web
dc.subject	Large Language Models
dc.subject	Bias
dc.subject	Privacy
dc.subject	Natural Language Processing
dc.subject	Social Networks
dc.subject	Fairness
dc.title	Three ethical dimensions of AI: Fairness in social recommenders, bias detection in LLMs, and privacy in NLP
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Potka_Shera_PhD_2025.pdf
Size:: 1.62 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)
Theses (Computer Science)