Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions #11

Open
eutampieri opened this issue Jan 31, 2020 · 3 comments
Open

Questions #11

eutampieri opened this issue Jan 31, 2020 · 3 comments

Comments

@eutampieri
Copy link
Contributor

  1. Is it possible that different training size across categories affects the probability of the category (i.e I have category A with 20 samples and category B with 7 samples, does category A get a higher likelihood?)
  2. I have done progressive (or incremental, as you like) training, but I forgot to try the category names, so they have a \n at the end. I've lost the training data but I have the trained model. Is it possible to somehow fix it?
@liufuyang
Copy link
Owner

1: yes, different training size will give you different "prior probability". Without knowing any other features, according to bayes rule, in your case it will try to predict A. You may use the prior_factor parameter to set it lower than 1 to reduce this effect. Though I am not sure from a theoretical view how correct this would be. But you can do some analysis on your test data to see wether it helps. Or you may set prior_factor lower than 1.0 as long as you know that the prior probability should be the same in your case of A and B.

  1. What do you mean by category names? Is it the class or label name? or Is it a feature name?
    Intuitively thinking, what you could do is basically write a program, load your model file into a structure like what is specified in this lib. Then you can basically switch the name in the keys (or name in the values if it is class name) in those maps defined here:
    https://github.com/liufuyang/rust-nb/blob/master/src/lib.rs#L584

Which could be a bit difficult, I am not sure. If you just want to change class output name, you can perhaps do suggested above, it would be easier than if you try to switch a feature name.

Otherwise I guess you could keep that \n and when new training comes, add \n if they don't have it, and when doing prediction (suppose you mean category names as class labels) you can perhaps trim the output again...?

Hope this helps somehow?

@eutampieri
Copy link
Contributor Author

eutampieri commented Jan 31, 2020 via email

@liufuyang
Copy link
Owner

liufuyang commented Jan 31, 2020

Okay. I guess when you train with \n, then when you predict it would just print label with \n, right?

Anyway...

Just to add a few more words to my idea above. So I think you would need to write a code that can allow you to read the model file into this type Model<ModelHashMapStore>, then you would try print the key and values of the maps inside it's model_store field (which is a ModelHashMapStore) type. Then you can replace whatever of the key or values from that point if you want. Then save the model again into another file to load :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants