CNN-based models for pitch estimation, modification, and auto-tuning
Date
2024
Authors
Jiang, Jiazhuo
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Pitch estimation and pitch modification are fundamental audio processing tasks that are used in a variety of applications. An important example is the auto-tuning of vocals in which pitch estimation is applied, deviations from a desired target pitch are calculated, and the pitch of input vocal signal is modified to match the target pitch. Most existing approaches to auto-tuning are based on traditional digital signal processing (DSP) techniques for both the pitch detection and the pitch modification of the signal. In this thesis, the use of Convolutional Neural Networks (CNNs) is explored as a possible replacement of traditional DSP methods for pitch estimation, pitch modification as well as end-to-end autotuning. CNNs can model complex intput and output relationships and are more efficient than deep learning methods that take into account time/sequence information such as Long Term/Short Term (LSTM) networks and Recurrent Neural Networks (RNNs). The results show the potential of this approach as well as some of the challenges that need to be overcome. The experimental results indicate that larger data sets can result in better accuracy but they also tend to bring in more noise.
Description
Keywords
Auto-Tuning