Skip to content

Latest commit

 

History

History
486 lines (436 loc) · 30.1 KB

experiment-notes.md

File metadata and controls

486 lines (436 loc) · 30.1 KB

7-11

Things to try

  • Baselines
    • HMM
    • RTRBM
    • RNN-RBM
    • N-gram language model: might be hard to sample, but could be used to score outputs
  • Interpolating between two sound bytes
    • Run forward and backward LSTMs, combine hidden states, emit from those states
    • Multiple ways to combine hidden states:
    • Concatenate at each time step and use. Might be bad because further away states are neglected.
      • NMT by Jointly Learning to Align and Translate: Attention mechanism taking weighted combination of all hidden states to be interpolated over.
  • Constraining length of output
    • Motivation: phrases are not plausibly long
    • Can introduce countdown to 0 in training and sampling procedures
  • t-SNE of learned neural embedding, do similar chords map similarly
  • Plot activations of hidden state over time
    • How does it know a chord has at most 4 notes? (I'm expecting to see a "chord-end" memory cell)

Discussion topics

  • Modulation : how to get LSTM to do it?
  • Interpolation : how to combine hidden states and emit? How to train?
  • Constraining length : sanity check, how to carry information forward across phrases?

6-5

Keras notes

  • To do a convolutional/time distribed operation, TimeDistributed assumes the 1st axis (excluding sample axis 0) is the time dimension. This means that Permute should be used to satisfy this assumption

  • For some reason my sharing of embedding matrices is only supported by the tensorflow backend...

6-4

"Modelling, Visualising and Summarising Documents with a Single Convolutional Neural Network", Denil et al 2014

  • Music is also sequence built up of individual measures, phrases, parts, etc, amenable to time-invariant convolution
  • Try building a convolutional representation for music then put discriminative classifiers on top
    • e.g. Bach vs XYZ, Major vs Minor Key

Pitch Classes

  • Don't really matter... 0.96 accuracy on pitch classes vs 0.95 without
  • Future experiments should include octaves since significantly improves generated output

6-3

Many papers use pitch classes (i.e. mod 12), removing octave information...

6-2

Things to try:

  • Segment phrases based on fermatas
  • Encode (pitch, duration, chord) like (Lichtenwalter 2009)

Spent rest of day setting up keras and tensorflow; seems to be easier to use for building new models...

5-30

Hyperparam opt over constant-timestep input monophonic models

Previous inputs had a new input per (pitch|REST,duration), these experiments expand this into (pitch|REST) tokens outputted at constant duration intervals.

Best result:
{u'num_layers': 1.0,
 u'rnn_size': 512.03541493830664,
 u'seq_length': 16.002948219489955,
 u'val_loss': 0.15576986968517303,
 u'wordvec_size': 6.0277702732883034}
seq_length rnn_size val_loss wordvec_size num_layers
16.0029 512.035 0.15577 6.02777 1
16.0029 471.29 0.175634 60.1177 1
16.0029 512.035 0.194531 1 1
11.3981 512.035 0.216492 128.825 1
11.1927 122.295 0.222993 128.825 2.13249
16.0029 512.035 0.241461 128.825 1
6.25468 460.95 0.247955 128.596 2.88079
6.92759 72.8999 0.272705 128.825 8.00018
4.00037 22.6282 0.306297 11.3501 2.82846
14.4828 13.1409 0.522982 1 1
1.00973 391.191 0.527153 59.2535 6.22639
15.3933 1 0.934034 128.825 1
15.9987 1 0.998908 128.825 1
1.8637 1.47925 1.16437 94.7932 5.22018
1 1 1.30671 1 1
16.0029 1 1.37111 1.35772 7.85394
1.85749 312.957 1.43217 27.7185 1.2719
1.15772 512.035 2.05152 128.825 3.09914
3.79838 512.035 3.99939 3.2196 8.00018
4.28328 1.08049 9.4367 128.825 7.73301
1.89024 1.44436 13.1816 12.317 8.00018

Hyperparam optimization over note-based monophonic modeling

  • Used Spearmint to do hyperparam optimization over major soprano monophonic LSTM models
  • Best result val_loss=1.13967 with seq_length=6.94253, rnn_size=29.5404, wordvec_size=126.366, num_layers=1.00082, all floored.
    • Sampling with temp=0.8 yielded believable melody lines

All Spearmint results:

val_loss seq_length rnn_size wordvec_size num_layers
1.81279 1 1 1 1
1.25221 4.00037 22.6282 11.3501 2.82846
1.54454 16.0029 86.9708 128.825 7.15475
1.91175 12.0003 10.0163 11.2472 6.57259
9.7702 1.03734 438.864 114.248 7.57239
3.72573 16.0029 1 128.825 1
10.5524 2.18817 142.192 1 2.83128
1.94992 1.34606 1 17.2647 8.00018
4.68856 16.0029 1 4.0772 1
3.09284 1 54.9462 21.2293 3.10958
1.84045 1 1.54607 12.446 2.11687
1.45445 16.0029 512.035 14.0071 1.09933
20.4365 2.49828 512.035 8.91201 1.17965
1.92739 15.4095 210.947 11.8957 3.7034
3.39995 7.74704 1 13.2716 6.58817
1.44211 16.0029 512.035 16.0497 1
1.83048 1 1 2.04846 1
4.70739 16.0029 1 38.4966 8.00018
3.84151 16.0029 1 128.825 1
1.41617 16.0029 512.035 128.825 8.00018
6.90625 5.17655 512.035 97.6266 8.00018
1.82316 3.89022 1.28814 1 1
1.58859 9.94785 512.035 128.825 1
1.61969 12.3247 512.035 128.825 1
1.33559 10.284 26.3633 128.825 6.37936
1.68185 4.7608 2.06105 128.825 1.00727
1.21108 4.29163 13.2033 128.825 1.62681
1.69945 1.39915 37.5867 2.38933 5.68261
2.26959 1.30102 7.32666 1 3.71789
1.20485 6.03521 14.0379 128.825 1
13.4733 4.98242 27.8648 1 8.00018
1.82551 4.37409 1.1129 128.825 1.0037
1.99838 3.55431 1 1.22266 1
1.57108 1 14.1768 3.16177 8.00018
1.36905 3.86123 4.4941 128.825 1
1.27876 5.44864 50.7731 128.825 1
1.67601 16.0029 483.363 128.825 6.73011
2.17016 1.02489 1 1.51668 6.55008
1.92507 1.52215 1 128.825 5.84352
1.17033 4.04997 48.6965 128.825 1.18371
1.45292 1 47.8396 128.825 7.77974
1.46929 16.0029 91.7265 128.825 5.99395
1.19416 4.06726 27.956 128.825 1.28507
2.51698 1.60321 20.3026 1 7.90567
4.53957 16.0029 512.035 1 7.85759
1.8674 1 1.72167 128.825 7.97761
2.04807 1.34766 3.16714 128.825 7.541
2.05087 10.6888 348.024 128.825 4.01615
2.03063 1.45055 1 3.80915 7.99896
4.48324 1.39391 338.201 18.1749 8.00018
1.8183 1.44086 1 1 1
2.18131 1.37276 1 1 4.3693
2.73249 1.48716 138.966 1.01189 7.2744
2.16369 1.41637 1 128.825 7.77549
2.24593 1.29825 1 1.00003 7.99566
2.49271 1 1.26242 1 7.51788
1.43529 1.17257 19.4415 128.825 8.00016
1.85919 4.48557 51.0752 1 1
1.27045 6.94479 85.2623 128.825 2.02529
1.29271 4.3174 41.9469 128.825 2.71045
1.16407 6.15871 41.2952 128.825 1
1.179 4.48587 53.2404 128.825 1
5.25998 3.96968 42.0234 1.14234 1
1.92197 3.77989 1 128.825 1.23153
1.67657 4.6666 23.0382 3.14412 1
14.7627 5.04117 174.806 1.52616 1
1.1712 5.66671 40.1667 128.825 1
2.09475 4.43712 16.4887 2.62326 1
1.3684 4.60999 4.36251 128.825 1
1.87167 3.95252 1 123.594 8.00018
2.0122 4.28938 1 1 1
1.72728 5.43109 4.92454 120.619 1
1.72963 3.97031 1 128.825 1
3.21394 4.18858 50.1616 123.625 8.00018
2.30705 3.70729 1 1 8.00018
1.75668 3.58032 118.189 93.7408 8.00018
2.83069 1.05197 78.8487 1.12416 8.00018
3.01681 2.96283 53.8838 126.457 8.00018
2.69333 1 12.5461 1.04687 5.61842
6.16394 16.0029 512.034 3.00566 7.79508
2.50471 3.20547 1 1.00527 8.00018
2.74393 2.86051 1 1.58627 1
1.94912 3.55173 321.378 128.825 8.00018
2.50607 3.1362 1 1.2272 7.86776
2.50117 3.40996 1 2.59141 8.00018
2.01436 1.9717 1 1.84658 8.00018
1.74399 2.13574 1 2.74339 1
1.55385 3.77291 19.4764 3.40497 1
1.84827 1 1 1.20563 1.00026
1.48099 16.0029 305.927 116.831 8.00018
3.25906 4.1078 218.305 128.825 8.00018
1.88984 3.52137 1 1.01016 1.00719
1.95496 3.10587 205.784 128.825 8.00018
2.53949 6.43489 24.0808 10.4327 5.12246
1.18219 16.0029 61.1378 128.735 2.79494
1.62688 1.01011 144.572 128.825 8.00018
1.19234 10.6007 30.2878 128.825 1.90145
1.89839 16.0029 501.778 115.486 3.64192
3.02298 6.336 48.863 1 1.01751
2.94935 4.87841 65.0807 1.58859 1.03422
1.48481 1 32.8238 5.21677 6.51185
2.57488 1 196.637 99.4593 8.00018
1.18709 2.31219 29.1906 16.6164 1
3.3022 1.00246 126.95 1 8.00018
1.35802 1.20229 35.7395 128.825 8.00018
1.17939 2.98678 21.6642 111.966 1
2.22157 1.37443 17.3358 4.96489 1.0013
1.45561 16.0029 18.6679 126.261 4.40354
1.33853 16.0029 269.284 128.097 1.01187
1.36142 15.922 42.1897 7.82053 4.28042
1.58634 3.79205 14.2155 7.52816 1.00302
2.07173 4.96631 35.1168 1 1.06558
2.16536 1.02073 60.2927 2.89752 7.69905
1.19761 4.99027 20.8433 128.825 1.00068
1.21183 3.25201 31.8954 121.463 4.74495
1.15955 7.68755 52.693 128.825 1
1.19747 15.9547 40.3288 106.232 4.79994
1.14678 14.3189 105.651 128.825 1
1.31329 4.75247 39.5341 6.5095 1
1.28955 3.83953 12.3442 128.825 6.88565
1.25079 3.35264 31.9617 128.499 7.30232
3.54208 2.79402 24.04 116.388 6.24677
1.23928 3.70084 38.0085 121.191 6.69755
1.18921 2.31037 7.9934 128.825 1
1.16043 3.37842 33.9173 128.253 1
1.55291 3.4025 90.3102 128.825 5.64659
1.19088 3.09136 47.4656 116.573 1
1.31964 3.57266 16.7673 127.438 8.00018
1.13287 2.12181 16.7073 127.608 1
1.29983 3.42929 54.3889 124.45 5.2666
1.89918 3.58951 1 2.99075 4.84539
1.15648 4.54977 24.7907 114.179 1
1.18002 3.16433 26.1107 109.408 1
1.28697 2.23085 11.4699 6.17508 1
1.47203 16.0029 424.294 96.665 8.00018
1.35923 15.7712 180.292 108.152 8.00018
1.22905 3.56232 23.4472 116.798 6.1748
1.55541 1.72011 93.4516 101.304 8.00018
2.04159 16.0029 40.1776 1 1.00011
1.1952 2.49004 20.0365 114.219 1
2.77794 15.5213 512.035 1 1.02812
1.61996 14.5473 512.035 4.56864 1
1.21182 3.58192 30.8165 128.573 7.61866
1.46842 16.0029 212.909 126.77 7.25935
1.23573 3.89539 16.4708 123.302 4.63252
1.68854 16.0029 93.8211 2.23815 1.01339
1.24737 15.7682 51.6474 117.842 1
1.18779 5.80308 23.8137 114.688 1
1.519 3.55924 80.2821 114.28 7.09611
1.26767 15.9138 163.389 128.463 1
1.58908 8.73408 60.4034 3.12543 1
1.13967 6.94253 29.5404 126.366 1.00082
1.47346 1.02767 15.9882 3.0994 6.65049
1.19142 3.78845 111.306 128.825 1.00895
6.06288 15.3437 490.379 1 8.00018
1.87639 4.9916 491.995 128.825 1
1.93319 4.25991 508.601 128.825 1
1.68181 5.70063 509.011 128.825 1
3.56743 5.02613 504.939 128.825 8.00018
1.88407 4.5674 511.515 128.825 1
1.97464 13.1559 510.938 1.60573 1.10128
1.53288 16.0029 454.708 128.478 1
3.16029 4.69018 509.507 1.20061 1
2.17807 16.0029 499.117 1.5671 1
4.56358 5.62898 456.094 120.225 8.00018
1.57462 16.0029 487.641 128.825 1.02787
1.76454 5.19752 17.6369 1.58856 1.01913
4.76278 1 511.678 1 8.00018
1.93808 2.43203 511.584 128.825 1
1.71001 2.74913 509.772 128.825 1
17.6859 3.12348 511.251 1.74208 1.23072
2.75805 4.46449 22.5683 1.67204 1
1.79666 16.0029 379.387 2.68668 1.00074
2.28637 5.92394 1 1.62136 1.46851
1.5856 2.8623 356.665 2.92364 1
1.75186 2.21219 1 2.15511 1.21592
18.9152 2.12912 168.995 1.21215 1
3.93017 4.75664 1 1.2334 6.75532
2.02585 1 1 1.65368 8.00018
1.82365 1 1 1.18676 2.68117
2.45338 16.0029 324.993 1.64182 1.00067
3.94181 2.77051 371.001 1.01043 1
2.58022 5.28293 1 1.00899 8.00018
2.32257 2.42847 488.999 75.7214 1
2.48066 5.19708 1 1.01857 1
1.83206 1 1 2.5297 1
1.98198 16.0029 239.976 1.80369 1
3.59905 2.21228 1 1.00908 8.00018
1.90486 5.60672 1 1.01175 1.68969
4.08377 11.4198 1 1.01033 1.99797
2.19399 3.45414 1 1.01273 6.39969
2.12843 5.91437 1 1.01445 1
5.06573 12.2589 252.052 1.01313 2.42166
1.89684 3.38058 1 1.01567 1.00369
1.82683 1.79312 1 1.61672 1
2.09055 1.99713 7.66155 1.02651 1.0045
3.26499 2.60936 298.33 1.03542 1
38.0935 1 387.262 1.02453 1
1.79081 2.47266 1 1.02802 1.30136
2.70179 4.08058 1 1.00678 5.66322
3.70049 5.01941 125.481 1.01641 8.00018
2.52746 2.26491 1 1.04334 8.00018
2.6144 1.34182 109.675 1.00672 7.98652
1.89099 1.51636 1.02817 1.06104 3.43684
5.01127 3.55865 188.044 1.04538 1
3.43548 1 69.1306 1.07355 8.00018
7.10486 6.85572 512.035 1.17991 2.29812
1.87445 16.0029 507.561 1.13841 1
1.63515 2.55714 340.268 105.369 1.88211
6.87511 1.59641 510.92 65.2621 1
7.40836 3.42253 329.605 1.02056 5.19144
1.44172 3.06819 436.638 124.987 1.04117
1.85288 1.63065 53.3601 1.0173 8.00018
1.97156 4.89706 70.355 1.01513 1
1.54226 16.0029 511.366 73.8461 1.01515
3.87939 5.92443 510.453 1.05408 1.48474
1.51564 3.17769 500.43 128.825 1.04849
2.05485 1 4.23242 1.12558 8.00018
1.97433 15.804 10.7363 1.35953 1.0445
1.42677 3.22435 343.821 128.825 1.10866
2.69979 2.62094 1 1 8.00018
2.80693 16.0029 508.23 1.2509 7.44169
5.56769 1.75418 115.663 82.5968 1.02226
10.0386 2.70432 376.212 90.4778 8.00018
1.75714 5.97219 511.494 128.825 1
3.61418 5.74625 501.833 1.16609 1
1.58739 3.34706 476.397 85.2811 1
1.54583 2.99822 228.601 108.462 1
3.28737 16.0029 54.0864 1.08781 8.00018
1.61078 16.0029 510.841 113.284 8.00018
7.69706 1 512.035 1 7.44544
2.67493 16.0029 441.365 1.77608 8.00018
1.74966 16.0029 50.4464 1.31085 1.07266
3.7585 16.0029 189.338 1.04997 8.00018
1.81978 2.38928 509.248 127.419 1
1.35456 3.49442 92.3896 126.596 2.37375
1.66493 2.31178 1 37.7945 3.80823
1.82544 2.72484 502.978 128.731 1
1.75726 2.87181 479.845 128.825 1
1.68781 15.0497 509.078 128.825 2.46445
1.46165 3.95094 496.274 128.056 1
1.62109 2.77635 325.136 128.444 1
9.84507 2.36965 1 1.22999 3.43455
2.12618 8.06339 37.5034 1.55497 1.03319
19.9971 3.70609 512.035 1 1.22997
2.50406 3.6462 1 1.07039 8.00018
1.74508 2.69366 126.026 1.01994 1.0059
1.84615 1.3313 1 1.006 1
6.35803 3.40397 70.1067 1.006 1
18.1778 2.2708 125.054 1.02172 8.00018
1.8263 1.6561 1 2.18745 2.95551
1.6426 1.85016 7.80418 107.429 3.65741
1.99685 2.53031 1 1 1.00799
1.82727 1.8859 1 1.01417 1.07867
10.267 2.8795 217.669 1.03202 1.09872
1.73753 2.7155 23.3139 1.00842 1
3.24402 3.02396 441.701 1.26887 1
1.34272 3.44249 238.292 128.825 1
1.93469 3.04653 1 1 1
1.36159 2.59927 163.243 124.008 1
1.81301 2.57622 510.034 128.825 1.00085
1.52424 3.09678 488.866 93.7029 1
2.34485 3.53712 492.5 105.515 8.00018
3.94889 1.54106 61.0157 1.00655 1
2.03146 3.30615 38.6263 1.53452 1
2.21316 4.7681 44.6425 1.00104 1
2.57991 1.8712 1 1.04351 8.00018

5-29

  • Will try:
    • Train on all voices
    • Split major/minor pieces apart
    • Model only the duration

Observations

  • Low validation loss doesn't imply poor perceptual performance. In contrast, overfit models tended to yield more realistic samples
  • Subsetting to only major/minor pieces significantly improves sample quality
  • Training on all four parts significantly improves performance over using just Soprano, but introduces obvious non-melodic parts (e.g. periods of rest)

5-28

  • Improved preprocessing using bachbot get_chorales
    • Get corpus with music21
    • Transpose to Cmaj/Amin (is there a standard way to do this?)
    • Strip all information except (Note+Octave|Rest, Duration)
    • Write processed data to bachbot/scratch/{bwv_id}-mono.txt

Results with new preprocessing

seq length wordvec size num layers rnn size dropout batchnorm lr nepoch final train loss final val loss
8 64 2 256 0 1 2e-3 30 0.238247 1.5794
8 64 2 128 0 1 2e-3 50 0.349 1.367
4 64 2 128 0 1 2e-3 50 0.288 1.434
4 32 2 128 0 1 2e-3 50 0.2527 1.8538
8 32 2 32 0 1 2e-3 50 1.044 1.191
8 32 2 64 0 1 2e-3 50 0.7539 1.236
8 64 2 32 0 1 2e-3 50 1.027 1.190
2 64 2 32 0 1 2e-3 50 0.783344 1.25899
4 64 2 32 0 1 2e-3 50 1.064 1.197
8 64 1 32 0 1 2e-3 50 1.022 1.188
8 64 1 32 0 1 2e-3 50 1.096 1.186
8 64 3 32 0 1 2e-3 50 0.989 1.186
8 64 3 32 0 1 2e-3 50 0.953 1.183
8 64 4 32 0 1 2e-3 50 1.0104 1.2274
8 64 4 64 0 1 2e-3 50 1.0165 1.2038
8 64 4 64 0.5 1 2e-3 27.51 1.392 1.4355
8 64 4 64 0.5 0 2e-3 25.10 1.807 1.851
6 64 3 32 0 1 2e-3 50 0.9304 1.2137
8 64 3 16 0 1 2e-3 50 1.264 1.2311
12 64 3 32 0 1 2e-3 50 1.030 1.1909

Generative results don't sound too realistic...

Try overfitting a model and sampling

seq_length=8,wordvec=128,num_layers=2,rnn_size=256,dropout=0,batchnorm=1,lr=2e-3

  • Sounds much better with an overfit LSTM and temperature=0.98...
    • Maybe generalizable modeling isn't a good criteria...

5-25

  • Added extract_melody, which extracts the 0th part from music21.stream.Score and assumes they are the melody

  • Music representation:

    • Since music21 cannot output kern, use musicXML output
    • We currently include all header and dynamics info; should we strip that?

Results on musicXML monophonic melody

seq length wordvec size num layers rnn size dropout batchnorm lr nepoch final train loss final val loss
500 64 2 256 0 1 2e-3 16.19 0.022378 0.029262
50 64 2 256 0 1 2e-3 13.41 0.028490 0.032692
100 64 2 256 0 1 2e-3 13.41 0.028490 0.032692

Results on kern format data

seq length wordvec size num layers rnn size dropout batchnorm lr nepoch final train loss final val loss
50 64 2 256 0 0 2e-3 51 0.443295 0.619
500 64 2 256 0 1 2e-3 21.45 0.4094 0.5779
500 64 2 256 0 1 2e-3 31.00 0.440350 0.572764
500 64 2 256 0 1 1e-2 28.73 0.287570 0.6176
50 64 2 256 0 1 1e-2 13.65 0.390861 0.6316

5-23

  • wordvec_size=64 appears to perform best, should use for defaults in future:
    • rnnsize=256
    • num_layers=2
    • wordvec_size=64

5-22-overnight

  • Training interrupted by cudnn recompilation
  • Results suggest val_loss does best with rnn_size=256, num_layers=2

5-5

  • Training on entire corpus ** BAD: kern format has K voices => each line has K space-delimited notes ** This suggests output should be a K-dimensional vector rather than character-by-character

  • Traning on just chorales