Implementing voice assistant for visually impaired using LLMs and Vision Language Models
dc.contributor.author | Jiang, Jinke | |
dc.contributor.supervisor | Yang, Hong-Chuan | |
dc.date.accessioned | 2024-10-23T22:11:10Z | |
dc.date.available | 2024-10-23T22:11:10Z | |
dc.date.issued | 2024 | |
dc.degree.department | Department of Electrical and Computer Engineering | |
dc.degree.level | Master of Engineering MEng | |
dc.description.abstract | As a result of population aging, the number of visually impaired people is growing. Unfortunately, there is limited accessibility measures to help improve the quality of life of these people. The recent technological development in Artificial Intelligence (AI), especially Large Language Models (LLMs), should offer effective and efficient solutions. Recognizing the limitation of existing products, we design and implement a user-friendly and privacy-safe voice assistant for visually impaired people. Using LLMs and Vision Language Models, the assistant can recognize and identify objects through low-latency speech-to-speech interactions. The assistant can be deployed on offline edge computing devices with camera/microphone/speaker, with easily extendable functionalities. In this report, we present the design, adopted technologies, and adjustment that we applied to arrive at the final implementation. | |
dc.description.scholarlevel | Graduate | |
dc.identifier.uri | https://hdl.handle.net/1828/20624 | |
dc.language.iso | en | |
dc.subject | LLM | |
dc.subject | voice assistant | |
dc.subject | Vision Language Model | |
dc.title | Implementing voice assistant for visually impaired using LLMs and Vision Language Models | |
dc.type | project |