Computer vision-based tracking and feature extraction for lingual ultrasound

dc.contributor.authorAl-Hammuri, Khalid
dc.contributor.supervisorBranzan Albu, Alexandra
dc.contributor.supervisorSo, Poman Pok-Man
dc.date.accessioned2019-04-30T22:36:52Z
dc.date.available2019-04-30T22:36:52Z
dc.date.copyright2019en_US
dc.date.issued2019-04-30
dc.degree.departmentDepartment of Electrical and Computer Engineering
dc.degree.levelMaster of Applied Science M.A.Sc.en_US
dc.description.abstractLingual ultrasound is emerging as an important tool for providing visual feedback to second language learners. In this study, ultrasound videos were recorded in sagittal plane as it provides an image for the full tongue surface in one scan, unlike the transverse plane which provides an information for small portion of the tongue in a single scan. The data were collected from five Arabic speakers as they pronounced fourteen Arabic sounds in three different vowel contexts. The sounds were repeated three times to form 630 ultrasound videos. The thesis algorithm was characterized by four steps. First: denoising the ultrasound image by using the combined curvelet transform and shock filter. Second: automatic selection of the tongue contour area. Third: tongue contour approximation and missing data estimation. Fourth: tongue contour transformation from image space to full concatenated signal and features extraction. The automatic tongue tracking results were validated by measuring the mean sum of distances between automatic and manual tongue contour tracking to give an accuracy of 0.9558mm. The validation for the feature extraction showed that the average mean squared error between the extracted tongue signature for different sound repetitions was 0.000858mm, which means that the algorithm could extract a unique signature for each sound and across different vowel contexts with a high degree of similarity. Unlike other related works, the algorithm showed an efficient and robust approach that could extract the tongue contour and the significant feature for the dynamic tongue movement on the full video frames, not just on the significant single and static video frame as used in the conventional method. The algorithm did not need any training data and had no limitation for the video size or the frame number. The algorithm did not fail during tongue extraction and did not need any manual re-initialization. Even when the ultrasound image recordings missed some tongue contour information, the thesis approach could estimate the missing data with a high degree of accuracy. The usefulness of the thesis approach as it can help the linguistic researchers to replace the manual tongue tracking by an automated tracking to save the time, then extracts the dynamics features for the full speech behavior to give better understanding of the tongue movement during the speech to develop a language learning tool for the second language learners.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/10812
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectcomputer visionen_US
dc.subjectlingual ultrasounden_US
dc.subjecttrackingen_US
dc.subjectfeature extractionen_US
dc.subjecttongueen_US
dc.titleComputer vision-based tracking and feature extraction for lingual ultrasounden_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Al-hammuri_Khalid_MASc_2019.pdf
Size:
5.67 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: