Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Run with given regressor instead of raising warning in ZeroInflatedRegressor #480

Open
DoDzilla-ai opened this issue Aug 28, 2021 · 10 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@DoDzilla-ai
Copy link

I am using the ZeroInflatedRegressor for the prediction of some materials' prices. For some materials I am getting this error:

Traceback (most recent call last):
  File "/home/bugra/project-mats/mat_analyser.py", line 61, in apply_regression
    zir.fit(X_train, y_train)
  File "/home/bugra/project-mats/mats_folder/lib/python3.7/site-packages/sklego/meta/zero_inflated_regressor.py", line 107, in fit
    "The predicted training labels are all zero, making the regressor obsolete. Change the classifier or use a plain regressor instead.")
ValueError: The predicted training labels are all zero, making the regressor obsolete. Change the classifier or use a plain regressor instead.

Tbh. I didn't understand much from the error. But, I guess using the regressor (in my case this is LinearRegression from sklearn) could give me some results. Why ZeroInflatedRegressor can't use the regressor I defined for it but instead raising this error? I think there should be a flag (a parameter) in ZeroInflatedRegressor which would let it fit the regressor as suggested by the error message if the user wants. If not, it could still raise this error.

@DoDzilla-ai DoDzilla-ai added the enhancement New feature or request label Aug 28, 2021
@koaning
Copy link
Owner

koaning commented Aug 29, 2021

Just to confirm; can you varify that y_train contains no zeros?

If so, the error makes sense to me. If there are no zeros on the dataset, the classifier in the meta-model can't train.

@DoDzilla-ai
Copy link
Author

DoDzilla-ai commented Aug 30, 2021

zir = ZeroInflatedRegressor(
            classifier=SVC(),
            regressor=LinearRegression()
        )
        try:
            zir.fit(X_train, y_train)
        except ValueError as e:
            if str(e) =="The predicted training labels are all zero, making the regressor obsolete. Change the classifier or use a plain regressor instead.":
                print(y_train.values)

And here are some of the y_train values:

[1 0 0 1 2 0 0 0 1 0]
[0 1 0 0 0 0 0 1 0 0]
[0 0 0 0 0 1 2 0 0 0]
[3 1 3 0 0 0 0 0 0 0]
[12  0  8  0  0  0  4  0  0  0]
[0 1 2 0 3 0 0 0 0 0]

PS: I am not saying the error is wrong or there is a bug, I am saying this should apply regression no matter what and it should be the user's decision. Maybe I am making a logical mistake here. If so, sorry to waste your time.

@MBrouns
Copy link
Collaborator

MBrouns commented Aug 30, 2021

The thing is that if there are only zero's in your y_train, there is no data left for the regressor to train on. We filter out all zero entries before passing the data to the regressor, as we don't want the zeros to bias the regressors prediction. There's no way the regressor component could still give results in that situation, hence the exception.

I do kind of see that in your use-case, where you're doing multi-output regression, you kind of want to ignore that situation for a single column if that happens. Is that indeed your use-case?

@DoDzilla-ai
Copy link
Author

Exactly @MBrouns . And I think this should be an option given to the user.

@koaning
Copy link
Owner

koaning commented Aug 30, 2021

Ah, you're doing multi-output regression! Now I see. That seems valid and indeed the ZeroInflatedRegressor wasn't designed with that in mind. Just looping @Garve in to be sure.

@DoDzilla-ai If we add a flag in the ZeroInflatedRegressor, we can use the native MultiOutputClassifier for your use-case, correct?

@MBrouns
Copy link
Collaborator

MBrouns commented Aug 30, 2021

I'm thinking something along the lines of handle_unknown in sklearns OneHotEncoder

ZeroInflatedRegressor(handle_zero='error')
ZeroInflatedRegressor(handle_zero='ignore')

@Garve
Copy link
Contributor

Garve commented Aug 30, 2021

Ah, you're doing multi-output regression! Now I see. That seems valid and indeed the ZeroInflatedRegressor wasn't designed with that in mind. Just looping @Garve in to be sure.

Exactly.

@koaning
Copy link
Owner

koaning commented Aug 30, 2021

@DoDzilla-ai is this an issue you'd like to pick up?

@DoDzilla-ai
Copy link
Author

@koaning I wish I could, currently struggling with health issues :(

@koaning
Copy link
Owner

koaning commented Sep 5, 2021

No worries, health is more important.

@koaning koaning added the good first issue Good for newcomers label Mar 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants