Scalable vision transformers for remote sensing semantic segmentation
dc.contributor.author | MacDonald, Ezra | |
dc.contributor.supervisor | Coady, Yvonne | |
dc.date.accessioned | 2024-09-03T22:12:52Z | |
dc.date.available | 2024-09-03T22:12:52Z | |
dc.date.issued | 2024 | |
dc.degree.department | Department of Computer Science | |
dc.degree.level | Master of Science MSc | |
dc.description.abstract | Assessing and monitoring environmental landscapes plays a critical role in preserving the environment and ensuring the well-being of communities around the world. The launch of low-orbit earth observation satellites has dramatically increased the availability and resolution of remote sensing data, enabling more precise and frequent monitoring of environmental changes and human impacts across diverse ecosystems. Traditional manual methods of analyzing this data to measure environmental properties are being improved by deep learning techniques, which can uncover complex patterns within the data. Recently, the Transformer architecture has been extended to computer vision, further enhancing the versatility and scalability of deep learning models. This thesis investigates the application of the Transformer architecture to semantic segmentation using medium-resolution satellite data. It explores the unique properties of remote sensing data and proposes techniques to improve deep learning model architectures and training methodologies for optimized results. Two contributions are presented: MineSegSAT and VistaFormer. MineSegSAT is designed to identify and monitor environmentally impacted areas of mineral extraction sites using Sentinel-2 imagery. It incorporates state-of-the-art deep learning models and loss functions to automate the detection of disturbed areas, aiding in environmental compliance monitoring. VistaFormer is introduced as a lightweight and efficient model for the semantic segmentation of satellite image time series (SITS) data. It features an encoder-decoder architecture with gated convolutions and self-attention Transformers in the encoder, paired with a lightweight convolution decoder. This model is designed to handle noise from atmospheric distortions and cloud cover while maintaining high performance and efficiency. The experimental results demonstrate that VistaFormer outperforms state-of-the-art models on time series crop-type semantic segmentation benchmarks, using fewer floating point operations and fewer trainable parameters. The findings suggest that Transformer-based architectures can significantly enhance the accuracy and efficiency of satellite imagery analysis, providing valuable tools for environmental and agricultural monitoring. | |
dc.description.scholarlevel | Graduate | |
dc.identifier.uri | https://hdl.handle.net/1828/20360 | |
dc.language | English | eng |
dc.language.iso | en | |
dc.rights | Available to the World Wide Web | |
dc.subject | Deep learning | |
dc.subject | Computer vision and pattern recognition | |
dc.subject | Remote sensing | |
dc.title | Scalable vision transformers for remote sensing semantic segmentation | |
dc.type | Thesis |