Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should we handle input data with Nulls/Blanks? #1471

Open
salamanders opened this issue Aug 28, 2023 · 1 comment
Open

How should we handle input data with Nulls/Blanks? #1471

salamanders opened this issue Aug 28, 2023 · 1 comment

Comments

@salamanders
Copy link

I was loading a CSV to try the HelloWorld ml5.neuralNetwork, and it threw a lot of errors like the input label YourInputColumn5 does not exist at row 8687

Which was absolutely correct - the training data is littered with blanks. Which is what I'm loading it in to fix - I want to train a classifier and fill in those blanks.

Is there a way to flag in the options "ya, I know that lots of values are missing from lots of columns. That's ok. That is what we are here to fix!"

@lindapaiste
Copy link
Contributor

@salamanders It is a requirement that training data has a classification attached to it. The purpose of the training is for the model to build associations between the input columns and the resulting classification.

You'll want to separate your CSV into two data sets. Those with known classifications will be used for training the model. Then you'll use the trained model to classify the empty ones.

You'll have to make this separation yourself before providing data to the ml5 model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants