Audio fingerprinting for speech reconstruction and recognition in noisy environments

Liu, Feng

Audio fingerprinting for speech reconstruction and recognition in noisy environments

dc.contributor.author	Liu, Feng
dc.contributor.supervisor	Tzanetakis, George
dc.date.accessioned	2017-04-13T16:13:29Z
dc.date.available	2017-04-13T16:13:29Z
dc.date.copyright	2017	en_US
dc.date.issued	2017-04-13
dc.degree.department	Department of Computer Science
dc.degree.level	Master of Science M.Sc.	en_US
dc.description.abstract	Audio fingerprinting is a highly specific content-based audio retrieval technique. Given a short audio fragment as query, an audio fingerprinting system can identify the particular file that contains the fragment in a large library potentially consisting of millions of audio files. In this thesis, we investigate the possibility and feasibility of applying audio fingerprinting to do speech recognition in noisy environments based on speech reconstruction. To reconstruct noisy speech, the speech is divided into small segments of equal length at first. Then, audio fingerprinting is used to find the most similar segment in a large dataset consisting of clean speech files. If the similarity is above a threshold, the noisy segment is replaced with the clean segment. At last, all the segments, after conditional replacement, are concatenated to form the reconstructed speech, which is sent to a traditional speech recognition system. In the above procedure, a critical step is using audio fingerprinting to find the clean speech segment in a dataset. To test its performance, we build a landmark-based audio fingerprinting system. Experimental results show that this baseline system performs well in traditional applications, but its accuracy in this new application is not as good as we expected. Next, we propose three strategies to improve the system, resulting in better accuracy than the baseline system. Finally, we integrate the improved audio fingerprinting system into a traditional speech recognition system and evaluate the performance of the whole system.	en_US
dc.description.scholarlevel	Graduate	en_US
dc.identifier.uri	http://hdl.handle.net/1828/7912
dc.language	English	eng
dc.language.iso	en	en_US
dc.rights	Available to the World Wide Web	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/	*
dc.subject	audio fingerprinting	en_US
dc.subject	speech reconstruction	en_US
dc.subject	speech recognition	en_US
dc.title	Audio fingerprinting for speech reconstruction and recognition in noisy environments	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Liu_Feng_MSc_2017.pdf
Size:: 2.9 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.74 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)