Deep Learning and Music Generation
Using deep learning (specifically recurrent neural network) to compose music is a cool idea. This post presents my study about using different recurrent neural networks to generate/simulate a soundtrack in The Godfather. The program is based on supervised learning with a simple note-by-note prediction approach, and is implemented by Python together with libraries such as Keras, MIDI, NLTK, and Pygame.
Original Music and Generated Music
First things first, here are two tracks: the original soundtrack and the sound generated/simulated by recurrent neural networks.
Some comments about these two songs:
- The second track titled “Godfather Love Theme (LSTM)” is generated/simulated by a special type of recurrent neural network (Long Short-Term Memory a.k.a. LSTM).
- The format of music processed by the program is actually MIDI. Because the website hosting the sound (i.e., SoundCloud) doesn’t support MIDI, I have to convert .mid files to .mp3 files for uploading/streaming.
- The generated/simulated track creates music notes only in a single channel and with a single instrument. Consequently, it is not as rich as the original one.
- The soundtracks generated/simulated by other recurrent neural networks are also stored on my SoundCloud account.
Processing the MIDI files and building/training the recurrent neural networks will be explained in following sections.
MIDI File Processing
The format of music managed by this project is MIDI. Reading, parsing, and creating MIDI files is handled by Python MIDI library, while tokenizing MIDI data to extract notes/volumes/channels is handled by Natural Language Toolikit (NLTK). A simple Pygame program plays music notes Do-Re-Me is shown in Fig.1.
The following figure shows 424 notes of the original soundtrack in The Godfather.
Converting a sequence of notes (Fig.3) to a set of training data (Fig.4) follows the idea of supervised learning and a simple note-by-note prediction approach: taking 10 notes as the training data and the next note as the labeled data, and sliding through 424 notes to produce 414 training data and labelled data.
Recurrent Neural Network
Keras provides three kinds of recurrent layers for building a recurrent neural network: Simple RNN, Gated Recurrent Unit (GRU), and Long Short-Term Memory (LSTM).
The code building and training a LSTM recurrent neural network is shown in Fig.5. The network has only four layers: input layer, recurrent layer, dense layer, and activation layer, while the loss function is categorical cross entropy and the activation function is softmax, following the idea given by Chapter 6 in Deep Learning with Keras.
Some tricks in the code:
- Line 23 — plot the network (See Fig. 6)
- Line 26 — define early stopping
- Line 30~41 — plot the loss history in 500 epochs with early stopping (See Fig. 7)
Conclusions
Implementing the idea and writing the blog is fun. However, the current implementation is not very stable in the sense that different runs of the same program might create soundtracks with different quality. Comparing and contrasting results from different recurrent neural networks with different hyperparameters requires more study. Training and learning recurrent neural networks to play tango, trance, and trip hop has a ways to go.