Implementing voice assistant for visually impaired using LLMs and Vision Language Models

dc.contributor.authorJiang, Jinke
dc.contributor.supervisorYang, Hong-Chuan
dc.date.accessioned2024-10-23T22:11:10Z
dc.date.available2024-10-23T22:11:10Z
dc.date.issued2024
dc.degree.departmentDepartment of Electrical and Computer Engineering
dc.degree.levelMaster of Engineering MEng
dc.description.abstractAs a result of population aging, the number of visually impaired people is growing. Unfortunately, there is limited accessibility measures to help improve the quality of life of these people. The recent technological development in Artificial Intelligence (AI), especially Large Language Models (LLMs), should offer effective and efficient solutions. Recognizing the limitation of existing products, we design and implement a user-friendly and privacy-safe voice assistant for visually impaired people. Using LLMs and Vision Language Models, the assistant can recognize and identify objects through low-latency speech-to-speech interactions. The assistant can be deployed on offline edge computing devices with camera/microphone/speaker, with easily extendable functionalities. In this report, we present the design, adopted technologies, and adjustment that we applied to arrive at the final implementation.
dc.description.scholarlevelGraduate
dc.identifier.urihttps://hdl.handle.net/1828/20624
dc.language.isoen
dc.subjectLLM
dc.subjectvoice assistant
dc.subjectVision Language Model
dc.titleImplementing voice assistant for visually impaired using LLMs and Vision Language Models
dc.typeproject

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jiang_Jinke_MEng_2024.pdf
Size:
4.58 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: