Extending the reach of Gaia with Masked Stellar Autoencoders

dc.contributor.authorMcKay, Aydan
dc.contributor.supervisorFabbro, Sébastien
dc.contributor.supervisorVenn, Kimberley Ann
dc.date.accessioned2025-08-29T19:17:45Z
dc.date.available2025-08-29T19:17:45Z
dc.date.issued2025
dc.degree.departmentDepartment of Physics and Astronomy
dc.degree.levelMaster of Science MSc
dc.description.abstractI present the Masked Stellar Autoencoder, a new data-driven holistic stellar model for Galactic archaeology. The MSA is trained using the complete Gaia DR3 XP spectra catalogue by implementing a self-supervised masking algorithm to enforce the learning of the relationships within the data itself. Photometry from six additional surveys spanning optical and infrared wavelengths are integrated into the dataset, making the model robust to missing spectroscopic and photometric data. This allows the embeddings to retain accuracy beyond the depth of the XP spectra. The model was first pretrained on the ~220 million stars from Gaia DR3 with photometry for the purpose of reconstructing the information. I then demonstrate the informative embeddings produced by this astronomical foundation model with the predictive task of deriving atmospheric parameters and stellar ages using high-resolution spectroscopic surveys (APOGEE, GALAH). The model achieved mean absolute errors of 92 K in $T_{eff}$, 0.08 dex in log $g$, and 0.09 dex in [Fe/H], demonstrating its competitive position with XGBoost and transformer-based models trained with APOGEE labels. Furthermore, the model achieved mean absolute errors of 0.05 dex in [$\alpha$/Fe] and 1.3 Gyr in age, with only marginal increases in metrics when missing XP spectra. The MSA also predicts errors for the stellar parameters, which were shown to be largely representative of the predicted values, with slight underconfidence in the width of the asymmetric errors. The change in the accuracy of the predictions with pretraining dataset size was examined, and the model was leveraged to predict stellar parameters for a subset of open clusters and dwarf galaxies Leo I and Fornax. These estimates displayed a potential improvement in parallax measurements at higher distances and crowded regions. This model effectively bridges the gap between spectroscopic and photometric samples within a single, consistent framework, poised to improve with the inclusion of additional photometric surveys and upcoming Gaia releases.
dc.description.scholarlevelGraduate
dc.identifier.urihttps://hdl.handle.net/1828/22691
dc.languageEnglisheng
dc.language.isoen
dc.rightsAvailable to the World Wide Web
dc.subjectAstronomy
dc.subjectGalactic Archaeology
dc.subjectDeep Learning
dc.titleExtending the reach of Gaia with Masked Stellar Autoencoders
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
McKay_Aydan_MSc_2025.pdf
Size:
17.15 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: