[FEATURE] Run with given regressor instead of raising warning in ZeroInflatedRegressor #480

DoDzilla-ai · 2021-08-28T21:01:49Z

I am using the ZeroInflatedRegressor for the prediction of some materials' prices. For some materials I am getting this error:

Traceback (most recent call last):
  File "/home/bugra/project-mats/mat_analyser.py", line 61, in apply_regression
    zir.fit(X_train, y_train)
  File "/home/bugra/project-mats/mats_folder/lib/python3.7/site-packages/sklego/meta/zero_inflated_regressor.py", line 107, in fit
    "The predicted training labels are all zero, making the regressor obsolete. Change the classifier or use a plain regressor instead.")
ValueError: The predicted training labels are all zero, making the regressor obsolete. Change the classifier or use a plain regressor instead.

Tbh. I didn't understand much from the error. But, I guess using the regressor (in my case this is LinearRegression from sklearn) could give me some results. Why ZeroInflatedRegressor can't use the regressor I defined for it but instead raising this error? I think there should be a flag (a parameter) in ZeroInflatedRegressor which would let it fit the regressor as suggested by the error message if the user wants. If not, it could still raise this error.

The text was updated successfully, but these errors were encountered:

koaning · 2021-08-29T07:30:33Z

Just to confirm; can you varify that y_train contains no zeros?

If so, the error makes sense to me. If there are no zeros on the dataset, the classifier in the meta-model can't train.

DoDzilla-ai · 2021-08-30T07:00:49Z

zir = ZeroInflatedRegressor(
            classifier=SVC(),
            regressor=LinearRegression()
        )
        try:
            zir.fit(X_train, y_train)
        except ValueError as e:
            if str(e) =="The predicted training labels are all zero, making the regressor obsolete. Change the classifier or use a plain regressor instead.":
                print(y_train.values)

And here are some of the y_train values:

[1 0 0 1 2 0 0 0 1 0]
[0 1 0 0 0 0 0 1 0 0]
[0 0 0 0 0 1 2 0 0 0]
[3 1 3 0 0 0 0 0 0 0]
[12  0  8  0  0  0  4  0  0  0]
[0 1 2 0 3 0 0 0 0 0]

PS: I am not saying the error is wrong or there is a bug, I am saying this should apply regression no matter what and it should be the user's decision. Maybe I am making a logical mistake here. If so, sorry to waste your time.

MBrouns · 2021-08-30T07:32:40Z

The thing is that if there are only zero's in your y_train, there is no data left for the regressor to train on. We filter out all zero entries before passing the data to the regressor, as we don't want the zeros to bias the regressors prediction. There's no way the regressor component could still give results in that situation, hence the exception.

I do kind of see that in your use-case, where you're doing multi-output regression, you kind of want to ignore that situation for a single column if that happens. Is that indeed your use-case?

DoDzilla-ai · 2021-08-30T07:57:40Z

Exactly @MBrouns . And I think this should be an option given to the user.

koaning · 2021-08-30T08:09:13Z

Ah, you're doing multi-output regression! Now I see. That seems valid and indeed the ZeroInflatedRegressor wasn't designed with that in mind. Just looping @Garve in to be sure.

@DoDzilla-ai If we add a flag in the ZeroInflatedRegressor, we can use the native MultiOutputClassifier for your use-case, correct?

MBrouns · 2021-08-30T08:33:36Z

I'm thinking something along the lines of handle_unknown in sklearns OneHotEncoder

ZeroInflatedRegressor(handle_zero='error')
ZeroInflatedRegressor(handle_zero='ignore')

Garve · 2021-08-30T13:49:45Z

Ah, you're doing multi-output regression! Now I see. That seems valid and indeed the ZeroInflatedRegressor wasn't designed with that in mind. Just looping @Garve in to be sure.

Exactly.

koaning · 2021-08-30T16:03:43Z

@DoDzilla-ai is this an issue you'd like to pick up?

DoDzilla-ai · 2021-09-01T07:07:28Z

@koaning I wish I could, currently struggling with health issues :(

koaning · 2021-09-05T14:57:10Z

No worries, health is more important.

DoDzilla-ai added the enhancement New feature or request label Aug 28, 2021

koaning added the good first issue Good for newcomers label Mar 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Run with given regressor instead of raising warning in ZeroInflatedRegressor #480

[FEATURE] Run with given regressor instead of raising warning in ZeroInflatedRegressor #480

DoDzilla-ai commented Aug 28, 2021

koaning commented Aug 29, 2021

DoDzilla-ai commented Aug 30, 2021 •

edited

Loading

MBrouns commented Aug 30, 2021 •

edited

Loading

DoDzilla-ai commented Aug 30, 2021

koaning commented Aug 30, 2021

MBrouns commented Aug 30, 2021

Garve commented Aug 30, 2021

koaning commented Aug 30, 2021

DoDzilla-ai commented Sep 1, 2021

koaning commented Sep 5, 2021

[FEATURE] Run with given regressor instead of raising warning in ZeroInflatedRegressor #480

[FEATURE] Run with given regressor instead of raising warning in ZeroInflatedRegressor #480

Comments

DoDzilla-ai commented Aug 28, 2021

koaning commented Aug 29, 2021

DoDzilla-ai commented Aug 30, 2021 • edited Loading

MBrouns commented Aug 30, 2021 • edited Loading

DoDzilla-ai commented Aug 30, 2021

koaning commented Aug 30, 2021

MBrouns commented Aug 30, 2021

Garve commented Aug 30, 2021

koaning commented Aug 30, 2021

DoDzilla-ai commented Sep 1, 2021

koaning commented Sep 5, 2021

DoDzilla-ai commented Aug 30, 2021 •

edited

Loading

MBrouns commented Aug 30, 2021 •

edited

Loading