Major
Data Science
Research Abstract
With the recent advancements in technology, many tasks in fields such as computer vision, natural language processing, and signal processing have been solved using deep learning architectures. In the audio domain, these architectures have been used to learn musical features of songs to predict: moods, genres, and instruments. In the case of genre classification, deep learning models were applied to popular datasets--which are explicitly chosen to represent their genres--and achieved state-of-the-art results. However, these results have not been reproduced on less refined datasets. To this end, we introduce an un-curated dataset which contains genre labels and 30-second audio previews for approximately fifteen thousand songs from Spotify. In our work, we focus on solving automatic genre classification using deep learning and crude data. Specifically, we propose deep architectures that learn hierarchical characteristics of music using raw waveform audio rather than preprocessed audio in the form of mel-spectrograms and apply these models to the Spotify dataset. Our experiments show how deep learning architectures using unpolished data can achieve comparable results to previous state-of-the-art music classifiers using filtered data.
Faculty Mentor/Advisor
David Guy Brizan
Included in
Applied Statistics Commons, Artificial Intelligence and Robotics Commons, Statistical Models Commons
Deep Neural Network Architectures For Music Genre Classification
With the recent advancements in technology, many tasks in fields such as computer vision, natural language processing, and signal processing have been solved using deep learning architectures. In the audio domain, these architectures have been used to learn musical features of songs to predict: moods, genres, and instruments. In the case of genre classification, deep learning models were applied to popular datasets--which are explicitly chosen to represent their genres--and achieved state-of-the-art results. However, these results have not been reproduced on less refined datasets. To this end, we introduce an un-curated dataset which contains genre labels and 30-second audio previews for approximately fifteen thousand songs from Spotify. In our work, we focus on solving automatic genre classification using deep learning and crude data. Specifically, we propose deep architectures that learn hierarchical characteristics of music using raw waveform audio rather than preprocessed audio in the form of mel-spectrograms and apply these models to the Spotify dataset. Our experiments show how deep learning architectures using unpolished data can achieve comparable results to previous state-of-the-art music classifiers using filtered data.