Scalable vision transformers for remote sensing semantic segmentation

MacDonald, Ezra

Scalable vision transformers for remote sensing semantic segmentation

dc.contributor.author	MacDonald, Ezra
dc.contributor.supervisor	Coady, Yvonne
dc.date.accessioned	2024-09-03T22:12:52Z
dc.date.available	2024-09-03T22:12:52Z
dc.date.issued	2024
dc.degree.department	Department of Computer Science
dc.degree.level	Master of Science MSc
dc.description.abstract	Assessing and monitoring environmental landscapes plays a critical role in preserving the environment and ensuring the well-being of communities around the world. The launch of low-orbit earth observation satellites has dramatically increased the availability and resolution of remote sensing data, enabling more precise and frequent monitoring of environmental changes and human impacts across diverse ecosystems. Traditional manual methods of analyzing this data to measure environmental properties are being improved by deep learning techniques, which can uncover complex patterns within the data. Recently, the Transformer architecture has been extended to computer vision, further enhancing the versatility and scalability of deep learning models. This thesis investigates the application of the Transformer architecture to semantic segmentation using medium-resolution satellite data. It explores the unique properties of remote sensing data and proposes techniques to improve deep learning model architectures and training methodologies for optimized results. Two contributions are presented: MineSegSAT and VistaFormer. MineSegSAT is designed to identify and monitor environmentally impacted areas of mineral extraction sites using Sentinel-2 imagery. It incorporates state-of-the-art deep learning models and loss functions to automate the detection of disturbed areas, aiding in environmental compliance monitoring. VistaFormer is introduced as a lightweight and efficient model for the semantic segmentation of satellite image time series (SITS) data. It features an encoder-decoder architecture with gated convolutions and self-attention Transformers in the encoder, paired with a lightweight convolution decoder. This model is designed to handle noise from atmospheric distortions and cloud cover while maintaining high performance and efficiency. The experimental results demonstrate that VistaFormer outperforms state-of-the-art models on time series crop-type semantic segmentation benchmarks, using fewer floating point operations and fewer trainable parameters. The findings suggest that Transformer-based architectures can significantly enhance the accuracy and efficiency of satellite imagery analysis, providing valuable tools for environmental and agricultural monitoring.
dc.description.scholarlevel	Graduate
dc.identifier.uri	https://hdl.handle.net/1828/20360
dc.language	English	eng
dc.language.iso	en
dc.rights	Available to the World Wide Web
dc.subject	Deep learning
dc.subject	Computer vision and pattern recognition
dc.subject	Remote sensing
dc.title	Scalable vision transformers for remote sensing semantic segmentation
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ezra MacDonald University of Victoria Thesis.pdf
Size:: 7.81 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)