Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot logits and ground truth like a spectrogram #5

Open
anthonio9 opened this issue Jan 15, 2024 · 14 comments
Open

Plot logits and ground truth like a spectrogram #5

anthonio9 opened this issue Jan 15, 2024 · 14 comments
Assignees

Comments

@anthonio9
Copy link
Owner

No description provided.

@anthonio9
Copy link
Owner Author

At the moment it seems that the plotting function is a bit broken. The number of xticks is as specified in the code, however the amount of total duration of the track differs very much from the length of the processed song.

Moreover, it seems that either inference or loss function are wrong, cause logits are very similar for every string. The current output of logits shows that all the strings output the polyphonic recognition instead of each string outputting its own monophonic pitch

Image

@anthonio9
Copy link
Owner Author

Alright, the x_labels are now fixed and show the right times.

@anthonio9
Copy link
Owner Author

Fixed plot, with new xlabels and ylabels.

Image

anthonio9 added a commit that referenced this issue Jan 16, 2024
* fix yticks and xticks
* fix the fontsize
* change the aspect ratio

related to: #5
@anthonio9
Copy link
Owner Author

anthonio9 commented Jan 16, 2024

Export of the figure is now much more suited for the polyphonic logits output. What's missing is the ground truth plot on top of the logits or at least next to them. Plots compatible with wandb are also missing.

@anthonio9 anthonio9 self-assigned this Jan 16, 2024
@anthonio9
Copy link
Owner Author

First the penn.evaluate code has to be adjusted to the new models, something isn't okay with that.

anthonio9 added a commit that referenced this issue Jan 18, 2024
@anthonio9
Copy link
Owner Author

Make sure that the logits are plotted nicely with the ground truth for the one string model and the original fcnf0++ trained with mtdb and ptdb. Then see what happens with the fcnf0++-gset-voiced configuration as well. Those should be quite okay for working with printing a all six strings on one set of logits only.

anthonio9 added a commit that referenced this issue Feb 1, 2024
@anthonio9
Copy link
Owner Author

This is now important again. Logit plots with ground truth on top do work well for the test set, however this is not currently available for audio files provided with labels, nor for monophonic audio files with polyphonic labels neither.

What we need is a function that can take:
a. single track file with solos, poly label
b. single track file with chords, poly label

The inside function has to take:

  • stft, poly logits, poly label
  • stft, mono logits, poly label

Tweak the existing penn.plot.logits to achieve the desired effect.

anthonio9 added a commit that referenced this issue May 8, 2024
@anthonio9
Copy link
Owner Author

Make a story with the plots. First show a plot with a raw pitch output, then output with pitch with periodicity values printed on top of it and finally different thresholds for the periodicity values filtering the pitch.

anthonio9 added a commit that referenced this issue May 26, 2024
@anthonio9
Copy link
Owner Author

Started the plots once again, this time with steps, first the STFT, then pred pitch, then ground truth, then finally pred pitch with thresholds. Below is the STFT with ground truth:

Image

anthonio9 added a commit that referenced this issue May 27, 2024
anthonio9 added a commit that referenced this issue Jun 3, 2024
@anthonio9
Copy link
Owner Author

anthonio9 commented Jun 6, 2024

Fixes and plots:

  • use plot of the logits instead of the STFT
  • fix the y axis of the multi-pitch plot
  • try different kinds of logits - unnormalized vs after sigmoid

Description on how did the model get trained for the multipitch strings.

  • how are the logits structured
  • is the softmax applied and where
  • especially is the softmax applied on per string basis or on all strings together
  • how is the target constructed
  • how the bins are structured, is it multi-hot, one-hot

Get a better understanding of the decoding with periodicity thresholding and if softmax is applied correctly

@anthonio9
Copy link
Owner Author

Logits after sigmoid make so much more sense when visualized. Unnormalized logits are in a way spread and not that clear.

@anthonio9
Copy link
Owner Author

anthonio9 commented Jun 20, 2024

  • Reorder the strings, put the lowest at the bottom
  • Set the color map range to be fixed between 0-1 for the normalized logits plots (fix it even for the unnormalized plots)
  • Try B&W color scale and perhaps seaborn

@anthonio9
Copy link
Owner Author

Example command for potting with the FCN model:

python -m penn.plot.to_latex --config config/polypennfcn-15ks-batch.py --checkpoint runs/polypennfcn-15ks-batch/00005000.pt --audio_file data/cache/gset/000121.wav -m -l --ground_truth_file data/cache/gset/000121-pitch.npy -m  -l

@anthonio9
Copy link
Owner Author

  • add a comparison of softmax only on the plot vs softmax + log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant