Losses from validation set are NAN #91

dhorangic · 2021-07-17T15:15:51Z

I have a training dataset of 384 labeled images - I have checked dataset, since all of these are simulated images they should not be mislabeled or any issues with the dataset itself like negative x/y values for bounding boxes. Then I have a validation set of 95 images - same thing, simulated images I generated with some other code. I load the validation and training sets with core.Dataset and then give to the model with the following code:

mdl.fit(dataset=trainset, val_dataset = validset, epochs=10, learning_rate=0.01, verbose=True)

I run the code on google colab notebook, prior to my code I do have to install detecto package
When I run the code I get a NAN error for the loss:

Not sure why this is happening, any help appreciated

The text was updated successfully, but these errors were encountered:

dhorangic · 2021-07-17T16:04:22Z

In addition to this, am having some issues with predictions that are strange - trained the model with the same exact code minus the validation set 2 days ago and got this prediction for an unseen image

However, now when I try to train the model with the exact same code (no validation set) there are no predictions generated, not on a previously seen training set image or the unseen image:

These predictions should all be the same - no code was changed here
My notebook with the code is here

alankbi · 2021-07-18T04:14:04Z

Could you take a look at #36 and see if any of those solutions work for you? Looking at the second image my guess is that it's the NaN issue causing the model to not train properly and therefore not produce any results.

This is definitely a weird issue for sure - if none of those work it could be worth trying to create a new notebook from scratch to see if that helps at all.

dhorangic · 2021-07-19T14:46:36Z

Hello, looked at #36 . Went through the code that generates the dataset again to make sure there were no issues with formatting.
Have created a new notebook and am running the code again right now
Here I uploaded an xml file I created - I just want to make sure the format is correct, since it seems like the other person's issue was with their dataset
One question - the content tag in the .xml files is wrong - it has 'content' but really the files end up in the folder 'train' or 'test'. Will this effect the results (does the tag have to be accurate)?

dhorangic · 2021-07-19T15:03:37Z

Model finished training in a brand new notebook

Same issue. Won't even make a prediction on images that were already in the training set, empty tensors
I will fix the .xml tag now to see if that helps

alankbi · 2021-08-04T02:11:51Z

Hello, looked at #36 . Went through the code that generates the dataset again to make sure there were no issues with formatting.
Have created a new notebook and am running the code again right now
Here I uploaded an xml file I created - I just want to make sure the format is correct, since it seems like the other person's issue was with their dataset
One question - the content tag in the .xml files is wrong - it has 'content' but really the files end up in the folder 'train' or 'test'. Will this effect the results (does the tag have to be accurate)?

The content tag shouldn't affect things at all, so that should hopefully not be the issue.

Were you able to ultimately figure this out? It seems like this issue has shown up now and then with a few people, but haven't been able to get a clear fix for it that works for everyone.

makya-stell · 2022-03-18T03:26:24Z

I am getting a similar error, except it is giving a division by zero error. Any suggestions on what I am doing wrong with calculating my loss.

alankbi · 2022-03-20T17:47:29Z

It looks like your validation dataset doesn't have any images in it on line 548 - what's the output when you run len(test_dataset)?

If it's 0, then this likely indicates some kind of issue with the format of the folder containing the XML files. If so, could you share an image of what those look like?

dhorangic added the bug Something isn't working label Jul 17, 2021

alankbi mentioned this issue Jan 7, 2022

Loss is nan on validation #98

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Losses from validation set are NAN #91

Losses from validation set are NAN #91

dhorangic commented Jul 17, 2021

dhorangic commented Jul 17, 2021

alankbi commented Jul 18, 2021

dhorangic commented Jul 19, 2021

dhorangic commented Jul 19, 2021

alankbi commented Aug 4, 2021

makya-stell commented Mar 18, 2022

alankbi commented Mar 20, 2022

Losses from validation set are NAN #91

Losses from validation set are NAN #91

Comments

dhorangic commented Jul 17, 2021

dhorangic commented Jul 17, 2021

alankbi commented Jul 18, 2021

dhorangic commented Jul 19, 2021

dhorangic commented Jul 19, 2021

alankbi commented Aug 4, 2021

makya-stell commented Mar 18, 2022

alankbi commented Mar 20, 2022