Implementing voice assistant for visually impaired using LLMs and Vision Language Models

dc.contributor.author	Jiang, Jinke
dc.contributor.supervisor	Yang, Hong-Chuan
dc.date.accessioned	2024-10-23T22:11:10Z
dc.date.available	2024-10-23T22:11:10Z
dc.date.issued	2024
dc.degree.department	Department of Electrical and Computer Engineering
dc.degree.level	Master of Engineering MEng
dc.description.abstract	As a result of population aging, the number of visually impaired people is growing. Unfortunately, there is limited accessibility measures to help improve the quality of life of these people. The recent technological development in Artificial Intelligence (AI), especially Large Language Models (LLMs), should offer effective and efficient solutions. Recognizing the limitation of existing products, we design and implement a user-friendly and privacy-safe voice assistant for visually impaired people. Using LLMs and Vision Language Models, the assistant can recognize and identify objects through low-latency speech-to-speech interactions. The assistant can be deployed on offline edge computing devices with camera/microphone/speaker, with easily extendable functionalities. In this report, we present the design, adopted technologies, and adjustment that we applied to arrive at the final implementation.
dc.description.scholarlevel	Graduate
dc.identifier.uri	https://hdl.handle.net/1828/20624
dc.language.iso	en
dc.subject	LLM
dc.subject	voice assistant
dc.subject	Vision Language Model
dc.title	Implementing voice assistant for visually impaired using LLMs and Vision Language Models
dc.type	project

Files

Now showing 1 - 1 of 1

Now showing 1 - 1 of 1