Skip to content

select feature for machine learning models using yaml configuration.

Notifications You must be signed in to change notification settings

imyutaro/featurefuse

Repository files navigation

featurefuse

featurefuse is Python package that allows you to choose which feature you make for machine learning models using yaml configuration.

This implementation is based on the below references.

How to use

  1. Implement feature engineering class inheriting Feature abstract class.
  2. Make config file.
    # Path to implemented feature class.
    use_feature:
      - .tmp.TmpFeature1
      - .tmp.TmpFeature2
    # Params for each implemented class.
    feature_params:
      TmpFeature1:
        shift_b_to: 3
      TmpFeature2:
        const: 1000
  3. Pass config and base/original input DataFrame to featurefuse.generator.run().
  4. You can get feature DataFrame and description of feature.

Following code is minimum example.

# Saved this code as tmp.py
import pandas as pd

from featurefuse.base import Feature


class TmpFeature1(Feature):
    def create_feature(self, tmp_df: pd.DataFrame, shift_b_to: int, **kwargs) -> pd.DataFrame:
        fe = tmp_df[["col1", "col2"]].copy(deep=True)
        self.create_description(
            f"feature col names generated by TmpFeature1", f"description of each cols of TmpFeature1"
        )
        for i in range(1, shift_b_to + 1):
            fe[f"col2_b{i}"] = fe["col2"].shift(i)
        return fe


class TmpFeature2(Feature):
    def create_feature(self, tmp_df: pd.DataFrame, const: int, **kwargs) -> pd.DataFrame:
        fe = tmp_df[["col1", "col2"]].copy(deep=True)
        self.create_description(
            f"feature col names generated by TmpFeature2", f"description of each cols of TmpFeature2"
        )
        fe[f"col2_add_{const}"] = fe["col2"] + const
        return fe


def main():
    import yaml

    from featurefuse.generator import run

    config = yaml.safe_load(
        """
    use_feature:
      # Path to TmpFeature1. In this case, this file saved as tmp.py, so path is .tmp.TmpFeature1
      - .tmp.TmpFeature1
      - .tmp.TmpFeature2
    feature_params:
      TmpFeature1:
        shift_b_to: 3
      TmpFeature2:
        const: 1000
    """
    )

    tmp_df = pd.DataFrame([["aaa", 123], ["bbb", 456], ["ccc", 789], ["ddd", 101]], columns=["col1", "col2"])
    feature, description = run(config, join_key="col1", tmp_df=tmp_df)
    print(feature)

    # you can use feature DataFrame for machine learning model etc.
    # model.fit(feature.drop("col1", axis="columns"), feature[["col1"]])


if __name__ == "__main__":
    main()

exec and then you get following output.

$ python tmp.py
  col1  col2  col2_b1  col2_b2  col2_b3  col2_add_1000
0  aaa   123      NaN      NaN      NaN           1123
1  bbb   456    123.0      NaN      NaN           1456
2  ccc   789    456.0    123.0      NaN           1789
3  ddd   101    789.0    456.0    123.0           1101

Other

This tool name is given by ChatGPT.

prompt: tell me own develop library name idea which select feature for machine learning in experiment phase.

Sure, here are a few potential names for a library that helps with feature selection for machine learning experiments:

  1. FeatureFuse
  2. Selectify
  3. FeatureCraft
  4. MachinaSelect
  5. FeatLab
  6. ModelMuse
  7. SmartFeat
  8. FeatureFlex
  9. MindFeat
  10. IntelliSelect I hope these ideas inspire you and help you find the perfect name for your library!

prompt: Which is your recommendation?

As an AI language model, I don't have personal preferences or biases, but I can provide some guidance based on the names I suggested earlier.

If you're looking for a name that highlights the library's ability to fuse or combine different features, "FeatureFuse" or "Selectify" could be good options. "FeatureCraft" and "MachinaSelect" convey a sense of precision and automation, which could be suitable if those are important qualities for your library. "FeatLab" and "ModelMuse" both sound like they could be useful tools for researchers or data scientists, while "SmartFeat" and "IntelliSelect" suggest a focus on intelligent and efficient feature selection. Finally, "MindFeat" could work if your library is designed to help researchers gain a deeper understanding of the features used in their models.

Ultimately, the name you choose will depend on your target audience and the specific features and benefits of your library. I hope this helps!

TODO

  • englishnize
  • test
  • write how to use
  • write example with hydra
  • setting up GitHub Actions or CI things

About

select feature for machine learning models using yaml configuration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages