TY - GEN
T1 - Supervised Learning-based Sound Source Distance Estimation Using Multivariate Features
AU - Zhagyparova, Kalamkas
AU - Zhagypar, Ruslan
AU - Zollanvari, Amin
AU - Akhtar, Muhammad Tahir
N1 - Funding Information:
This work was partially supported by the Faculty Development Competitive Research Grants Program of Nazarbayev University under Grant Number 110119FD4525.
Publisher Copyright:
© 2021 IEEE.
PY - 2021/8/23
Y1 - 2021/8/23
N2 - This paper introduces the use of supervised machine learning methods with a combination of several sound source distance-dependent features to tackle the problem of distance-of-arrival (DisOA) estimation. The DisOA estimation is approached as a classification problem, which aims to classify a recorded audio signal into one of the predefined four DisOA classes regardless of the orientation angle. The datasets for both training and testing purposes are simulated by convolving appropriate room impulse responses with anechoic speech signals. The performance of three conventional and efficient classifiers was examined along with various subsets of four extracted features including: 1) Diffuseness (DIFF); 2) Binaural spectral magnitude difference standard deviation (BSMD-STD); 3) Magnitude squared coherence (MSC); and 4) Direct-to-reverberant ratio (DRR). The simulations consider the use of different source signals as well as varying directions-of-arrival and the room sizes. Our empirical results show that the use of a single univariate feature, namely, MSC, along with K-nearest neighbor (KNN) could potentially lead to an accurate DisOA classification rule.
AB - This paper introduces the use of supervised machine learning methods with a combination of several sound source distance-dependent features to tackle the problem of distance-of-arrival (DisOA) estimation. The DisOA estimation is approached as a classification problem, which aims to classify a recorded audio signal into one of the predefined four DisOA classes regardless of the orientation angle. The datasets for both training and testing purposes are simulated by convolving appropriate room impulse responses with anechoic speech signals. The performance of three conventional and efficient classifiers was examined along with various subsets of four extracted features including: 1) Diffuseness (DIFF); 2) Binaural spectral magnitude difference standard deviation (BSMD-STD); 3) Magnitude squared coherence (MSC); and 4) Direct-to-reverberant ratio (DRR). The simulations consider the use of different source signals as well as varying directions-of-arrival and the room sizes. Our empirical results show that the use of a single univariate feature, namely, MSC, along with K-nearest neighbor (KNN) could potentially lead to an accurate DisOA classification rule.
KW - Acoustic Distance Estimation
KW - KNN
KW - LDA
KW - NMC
KW - Sound Source Localization
UR - http://www.scopus.com/inward/record.url?scp=85117446340&partnerID=8YFLogxK
U2 - 10.1109/TENSYMP52854.2021.9551007
DO - 10.1109/TENSYMP52854.2021.9551007
M3 - Conference contribution
AN - SCOPUS:85117446340
T3 - TENSYMP 2021 - 2021 IEEE Region 10 Symposium
BT - TENSYMP 2021 - 2021 IEEE Region 10 Symposium
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE Region 10 Symposium, TENSYMP 2021
Y2 - 23 August 2021 through 25 August 2021
ER -