Robust Single-Channel Speech Enhancement and Speaker Localization in Adverse Environments




Mosayyebpour, Saeed

Journal Title

Journal ISSN

Volume Title



In speech communication systems such as voice-controlled systems, hands-free mobile telephones and hearing aids, the received signals are degraded by room reverberation and background noise. This degradation can reduce the perceived quality and intelligibility of the speech, and decrease the performance of speech enhancement and source localization. These problems are difficult to solve due to the colored and nonstationary nature of the speech signals, and features of the Room Impulse Response (RIR) such as its long duration and non-minimum phase. In this dissertation, we focus on two topics of speech enhancement and speaker localization in noisy reverberant environments. A two-stage speech enhancement method is presented to suppress both early and late reverberation in noisy speech using only one microphone. It is shown that this method works well even in highly reverberant rooms. Experiments under different acoustic conditions confirm that the proposed blind method is superior in terms of reducing early and late reverberation effects and noise compared to other well known single-microphone techniques in the literature. Time Difference Of Arrival (TDOA)-based methods usually provide the most accurate source localization in adverse conditions. The key issue for these methods is to accurately estimate the TDOA using the smallest number of microphones. Two robust Time Delay Estimation (TDE) methods are proposed which use the information from only two microphones. One method is based on adaptive inverse filtering which provides superior performance even in highly reverberant and moderately noisy conditions. It also has negligible failure estimation which makes it a reliable method in realistic environments. This method has high computational complexity due to the estimation in the first stage for the first microphone. As a result, it can not be applied in time-varying environments and real-time applications. Our second method improves this problem by introducing two effective preprocessing stages for the conventional Cross Correlation (CC)-based methods. The results obtained in different noisy reverberant conditions including a real and time-varying environment demonstrate that the proposed methods are superior compared to the conventional TDE methods.



skewness, early and late reverberation, noise, single-microphone, spectral subtraction, Time Delay Estimation (TDE), Time Difference of Arrival (TDOA), Adaptive Inverse Filtering (AIF), Generalized Cross-Correlation (GCC), room impulse response (RIR)