Reducing Training Time in Text Visual Question Answering

Show simple item record

dc.contributor.author Behboud, Ghazale
dc.date.accessioned 2022-07-15T19:35:24Z
dc.date.available 2022-07-15T19:35:24Z
dc.date.copyright 2022 en_US
dc.date.issued 2022-07-15
dc.identifier.uri http://hdl.handle.net/1828/14062
dc.description.abstract Artificial Intelligence (AI) and Computer Vision (CV) have brought the promise of many applications along with many challenges to solve. The majority of current AI research has been dedicated to single-modal data processing meaning they use only one modality such as visual recognition or text recognition. However, real-world challenges are often a combination of different modalities of data such as text, audio and images. This thesis focuses on solving the Visual Question Answering (VQA) problem which is a significant multi-modal challenge. VQA is defined as a computer vision system that when given a question about an image will answer based on an understanding of both the question and image. The goal is improving the training time of VQA models. In this thesis, Look, Read, Reason and Answer (LoRRA), which is a state-of-the-art architecture, is used as the base model. Then, Reduce Uni-modal Biases (RUBi) is applied to this model to reduce the importance of uni- modal biases in training. Finally, an early stopping strategy is employed to stop the training process once the model accuracy has converged to prevent the model from overfitting. Numerical results are presented which show that training LoRRA with RUBi and early stopping can converge in less than 5 hours. The impact of batch size, learning rate and warm up hyper parameters is also investigated and experimental results are presented. en_US
dc.language English eng
dc.language.iso en en_US
dc.rights Available to the World Wide Web en_US
dc.subject AI en_US
dc.subject ML en_US
dc.subject Deep Learning en_US
dc.subject Machine Learning en_US
dc.subject Visual Question Answering en_US
dc.subject Convolutional Neural Network en_US
dc.subject Recurrent Neural Network en_US
dc.subject Long Short Term Memory en_US
dc.subject Early Stopping en_US
dc.title Reducing Training Time in Text Visual Question Answering en_US
dc.type Thesis en_US
dc.contributor.supervisor Gulliver, T. Aaron
dc.degree.department Department of Electrical and Computer Engineering en_US
dc.degree.level Master of Applied Science M.A.Sc. en_US
dc.description.scholarlevel Graduate en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UVicSpace


My Account