Applying automatic speech recognition to Indigenous language documentation: A case study with Hul’q’umi’num’

dc.contributor.authorJiang, Xin He
dc.contributor.supervisorBird, Sonya
dc.contributor.supervisorUrbanczyk, Suzanne Claire
dc.date.accessioned2026-06-01T19:07:32Z
dc.date.available2026-06-01T19:07:32Z
dc.date.issued2026
dc.degree.departmentSchool of Languages, Linguistics and Cultures
dc.degree.levelMaster of Arts MA
dc.description.abstractThe process of documenting Indigenous languages can create a large amount of audio recordings that are difficult to convert into a written form. Speeding up the transcription process using automatic speech recognition could help the Hul’q’umi’num’ Language & Culture Society to create pedagogical materials and make their recordings more accessible. In this project, I trained a language model known as XLS-R on Hul’q’umi’num’ audio recordings to determine how accurately it can transcribe Hul’q’umi’num’, whether particular linguistic and orthographic features are more difficult for XLS-R to transcribe, and what amount of time and computational resources the training takes. The model reached a CER of 11.1% and WER of 50% using 26 minutes of continuous speech. Most phonemes could be transcribed with high accuracy but the model showed difficulties with segmenting words, differentiating glottalized consonants from plain consonants, determining vowel length, and predicting the placement of glottal stops.
dc.description.scholarlevelGraduate
dc.identifier.urihttps://hdl.handle.net/1828/23969
dc.languageEnglisheng
dc.language.isoen
dc.rightsAvailable to the World Wide Web
dc.subjectautomatic speech recognition
dc.subjectHul'q'umi'num'
dc.subjectlinguistics
dc.subjecttranscription
dc.subjecterror analysis
dc.subjectIndigenous
dc.titleApplying automatic speech recognition to Indigenous language documentation: A case study with Hul’q’umi’num’
dc.typeThesis

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jiang_Xin_He_MA_2026.pdf
Size:
1.52 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.62 KB
Format:
Item-specific license agreed upon to submission
Description: