Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 'EstimatorTransformer' object has no attribute 'get_feature_names_out' #533

Open
CarloLepelaars opened this issue Sep 14, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@CarloLepelaars
Copy link
Contributor

CarloLepelaars commented Sep 14, 2022

When calling get_feature_names_out on EstimatorTransformer or a Pipeline that contains EstimatorTransformer you will get the following error:
AttributeError: 'EstimatorTransformer' object has no attribute 'get_feature_names_out'

Minimal reproducible example:

from sklego.meta import EstimatorTransformer
from sklearn.linear_model import LinearRegression
EstimatorTransformer(LinearRegression()).get_feature_names_out(None)

I thought this issue was resolved in scikit-learn >= 1.1 (get_feature_names_out Available in all Transformers release highlight), but apparently a manual implementation of get_feature_names_out is still needed for custom scikit-learn transformers.

Proposed solution sketch:

from sklearn.utils.validation import check_is_fitted
class EstimatorTransformer(TransformerMixin, MetaEstimatorMixin, BaseEstimator):
    .
    .
    .
    def fit(X, y, **kwargs):
        .
        .
        .
        # Store how many output columns estimator has
        self.output_len_ = y.shape[1] if self.multi_output_ else 1
        .
        .
    
    def get_feature_names_out(self, feature_names_out=None) -> list:
        """ 
        Get names for output of EstimatorTransformer. 
        Estimator must be fitted first before this function can be called. 
        """
        check_is_fitted(self.estimator_)
        if self.multi_output_:
            feature_names = [f"prediction_{i}" for i in range(self.output_len_)]
        else: 
            feature_names = ["prediction"]
        return feature_names
        

Happy to contribute this if you agree with the proposed solution idea. If this a general problem I'm also open to work on implementing get_feature_names_out for other transformers in scikit-lego.

@CarloLepelaars CarloLepelaars added the bug Something isn't working label Sep 14, 2022
@koaning
Copy link
Owner

koaning commented Sep 14, 2022

Minor ask: you can attach a language to a code-block to get syntax highlighting. Like so:

```python
import pandas as pd
```

That said. Mhm ... I'm wondering what other meta estimators will have the same issue. @CarloLepelaars I did have a quick look at the VotingClassifier in sklearn and it seems that also in sklearn not every Meta estimator has get_feature_names_out implemented all the time.

I'm also curious if scikit-learn has tests for this behavior that we can copy. @CarloLepelaars did you check the sklearn repo for this by any chance?

@CarloLepelaars
Copy link
Contributor Author

CarloLepelaars commented Sep 14, 2022

Minor ask: you can attach a language to a code-block to get syntax highlighting.

Makes sense! Added syntax highlighting in comment above.

🤔 Interesting! Seems odd that it is implemented for VotingClassifier, but not for other estimators in the ensemble module like BaggingClassifier.

Here is an example of a get_feature_names_out test case for LDA in sklearn:
https://github.com/scikit-learn/scikit-learn/blob/5bd81234e6e6501ddcddbfdfdc80b90a1302af55/sklearn/tests/test_discriminant_analysis.py#L659

@CarloLepelaars
Copy link
Contributor Author

@koaning, shall I go ahead and implement this for EstimatorTransformer? After that we can evaluate if implementation is needed for other Meta estimators in sklego. I'm sure the implementation for EstimatorTransformer will give insights on the need for get_feature_names_out in other Meta estimators.

@koaning
Copy link
Owner

koaning commented Sep 27, 2022

Yes please!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants