Extending the reach of Gaia with Masked Stellar Autoencoders

McKay, Aydan

Extending the reach of Gaia with Masked Stellar Autoencoders

dc.contributor.author	McKay, Aydan
dc.contributor.supervisor	Fabbro, Sébastien
dc.contributor.supervisor	Venn, Kimberley Ann
dc.date.accessioned	2025-08-29T19:17:45Z
dc.date.available	2025-08-29T19:17:45Z
dc.date.issued	2025
dc.degree.department	Department of Physics and Astronomy
dc.degree.level	Master of Science MSc
dc.description.abstract	I present the Masked Stellar Autoencoder, a new data-driven holistic stellar model for Galactic archaeology. The MSA is trained using the complete Gaia DR3 XP spectra catalogue by implementing a self-supervised masking algorithm to enforce the learning of the relationships within the data itself. Photometry from six additional surveys spanning optical and infrared wavelengths are integrated into the dataset, making the model robust to missing spectroscopic and photometric data. This allows the embeddings to retain accuracy beyond the depth of the XP spectra. The model was first pretrained on the ~220 million stars from Gaia DR3 with photometry for the purpose of reconstructing the information. I then demonstrate the informative embeddings produced by this astronomical foundation model with the predictive task of deriving atmospheric parameters and stellar ages using high-resolution spectroscopic surveys (APOGEE, GALAH). The model achieved mean absolute errors of 92 K in $T_{eff}$, 0.08 dex in log $g$, and 0.09 dex in [Fe/H], demonstrating its competitive position with XGBoost and transformer-based models trained with APOGEE labels. Furthermore, the model achieved mean absolute errors of 0.05 dex in [$\alpha$/Fe] and 1.3 Gyr in age, with only marginal increases in metrics when missing XP spectra. The MSA also predicts errors for the stellar parameters, which were shown to be largely representative of the predicted values, with slight underconfidence in the width of the asymmetric errors. The change in the accuracy of the predictions with pretraining dataset size was examined, and the model was leveraged to predict stellar parameters for a subset of open clusters and dwarf galaxies Leo I and Fornax. These estimates displayed a potential improvement in parallax measurements at higher distances and crowded regions. This model effectively bridges the gap between spectroscopic and photometric samples within a single, consistent framework, poised to improve with the inclusion of additional photometric surveys and upcoming Gaia releases.
dc.description.scholarlevel	Graduate
dc.identifier.uri	https://hdl.handle.net/1828/22691
dc.language	English	eng
dc.language.iso	en
dc.rights	Available to the World Wide Web
dc.subject	Astronomy
dc.subject	Galactic Archaeology
dc.subject	Deep Learning
dc.title	Extending the reach of Gaia with Masked Stellar Autoencoders
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: McKay_Aydan_MSc_2025.pdf
Size:: 17.15 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.62 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)