Questions #11

eutampieri · 2020-01-31T13:50:42Z

Is it possible that different training size across categories affects the probability of the category (i.e I have category A with 20 samples and category B with 7 samples, does category A get a higher likelihood?)
I have done progressive (or incremental, as you like) training, but I forgot to try the category names, so they have a \n at the end. I've lost the training data but I have the trained model. Is it possible to somehow fix it?

The text was updated successfully, but these errors were encountered:

liufuyang · 2020-01-31T21:43:31Z

1: yes, different training size will give you different "prior probability". Without knowing any other features, according to bayes rule, in your case it will try to predict A. You may use the prior_factor parameter to set it lower than 1 to reduce this effect. Though I am not sure from a theoretical view how correct this would be. But you can do some analysis on your test data to see wether it helps. Or you may set prior_factor lower than 1.0 as long as you know that the prior probability should be the same in your case of A and B.

What do you mean by category names? Is it the class or label name? or Is it a feature name?
Intuitively thinking, what you could do is basically write a program, load your model file into a structure like what is specified in this lib. Then you can basically switch the name in the keys (or name in the values if it is class name) in those maps defined here:
https://github.com/liufuyang/rust-nb/blob/master/src/lib.rs#L584

Which could be a bit difficult, I am not sure. If you just want to change class output name, you can perhaps do suggested above, it would be easier than if you try to switch a feature name.

Otherwise I guess you could keep that \n and when new training comes, add \n if they don't have it, and when doing prediction (suppose you mean category names as class labels) you can perhaps trim the output again...?

Hope this helps somehow?

eutampieri · 2020-01-31T21:48:05Z

Ok, thanks! However, I’ve retrained the old model (I was training with labels followed by \n)

liufuyang · 2020-01-31T21:52:37Z

Okay. I guess when you train with \n, then when you predict it would just print label with \n, right?

Anyway...

Just to add a few more words to my idea above. So I think you would need to write a code that can allow you to read the model file into this type Model<ModelHashMapStore>, then you would try print the key and values of the maps inside it's model_store field (which is a ModelHashMapStore) type. Then you can replace whatever of the key or values from that point if you want. Then save the model again into another file to load :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions #11

Questions #11

eutampieri commented Jan 31, 2020

liufuyang commented Jan 31, 2020

eutampieri commented Jan 31, 2020 via email

liufuyang commented Jan 31, 2020 •

edited

Loading

Questions #11

Questions #11

Comments

eutampieri commented Jan 31, 2020

liufuyang commented Jan 31, 2020

eutampieri commented Jan 31, 2020 via email

liufuyang commented Jan 31, 2020 • edited Loading

liufuyang commented Jan 31, 2020 •

edited

Loading