Audio analysis of customer calls for predicting purchase intentions: A novel approach to e-commerce insights

Date

2024

Authors

Yu, Miao

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Client audio recordings represent a valuable resource for many types of businesses. Utilizing these recordings to identify potential customers can help enhance purchase rates and reduce marketing costs, particularly with different kinds of machine learning methods that automatically label different groups, including positive, neutral, and negative buyers, instead of manual analysis. Though previous research has predominantly focused on text content analysis for this purpose, audio features, which effectively capture voice nuances such as tone, pitch, rhythm, and interaction patterns between interviewers and interviewees, may impact the model performance. This project explored an innovative method. It firstly investigates the effectiveness of emotion detection through audio features, leveraging two datasets: the Toronto Emotional Speech Set (TESS) and the Surrey Audio-Visual Expressed Emotion Dataset (SAVEE). Furthermore, hierarchical clustering techniques are applied to explore the relationship between emotion-related audio features and customer categories using audio data provided by VINN Auto, an e-commerce firm. Next, Exploratory Data Analysis (EDA) is conducted to find the correlation between interaction-related audio features and customer categories, including positive, neutral, and negative buyers within the same dataset after labeling it. Using supervised learning, the results indicate that integrating audio features, including emotion-related and interaction pattern features, can affect the performance of models like Support Vector Machines (SVM), Decision Tree, and Extreme Gradient Boosting (XGBoosts), particularly when combined with traditional audio content-related features such as Term Frequency-Inverse Document Frequency (TF-IDF) scores while applying adjusted weight configuration for positive class. After these exploration, an ensemble method using a soft voting mechanism across these three models is developed to assess whether it can enhance the identification of potential purchasers. The approach of combining emotion-related audio features, interaction pattern features, and content-based features like TF-IDF scores with tailored weight configurations highlights the value of collaborating audio features in customer identification tasks compared with only using content-based features like TF-IDF scores. It could be a robust strategy for improving classification outcomes for the relevant analysis in the future.

Description

Keywords

purchase intention, protential purchasers, audio, emotion-related features, interaction-pattern features, text, content-related features

Citation