Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggestion: split between train/test set, allow training and loading of imputation statistics models #95

Open
xuancong84 opened this issue Apr 23, 2020 · 3 comments

Comments

@xuancong84
Copy link

In research, scientific integrity plays a very important part. One can publish very good papers by playing tricks between train and test set in order to get good results, but such results can never be applied in real life, because those tricks simply does not work in real-life applications.

Thank you very much for creating a wonderful framework for missing value imputation! However, your framework does not provide a way to apply imputation statistics trained on one dataset onto another dataset. I would greatly appreciate if you can make it.

For downward compatibility, you can create an optional kwarg called model for every function such as impy.mean, impy.mode, etc. When calling the function, by default model=None; if you pass model=True, the function will return a tuple consisting both the imputed data and the imputation statistics object; if you pass model=<imputation-statistics-object>, then the function will apply the trained imputation statistics to impute the data. In that way, all existing code will not be affected.

@Sandy4321
Copy link

will be great to do this

@Sandy4321
Copy link

seems to be scikit learn have this?
https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html#sklearn.impute.KNNImputer
they do have transform(self, X) | Impute all missing values in X.

fit(self, X[, y]) Fit the imputer on X.
fit_transform(self, X[, y]) Fit to data, then transform it.
get_params(self[, deep]) Get parameters for this estimator.
set_params(self, **params) Set the parameters of this estimator.
transform(self, X) Impute all missing values in X.

@xuancong84
Copy link
Author

Thanks @Sandy4321. I am aware of sklearn. Nevertheless, if you can make your code good enough, you can contribute your code into sklearn -:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants