-
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SapientML to automl benchmark #630
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Kosaku Kimura <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contributions! I haven't had time to try this out yet, but I do already have a couple questions and suggested changes based on the PR. Please have a look at them.
|
||
# Sapientml | ||
output_dir = config.output_dir + "/" + "outputs" + "/" + config.name + "/" + str(config.fold) | ||
predictor = SapientML([target_col], task_type="classification" if is_classification else "regression") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the abstract, it seems that there is meta-learning involved. Are there datasets in the meta-learning corpus that are also in the AutoML benchmark? If so, is there a way to avoid "turn off" the inclusion of that data from the meta-model for individual evaluations (e.g., don't use meta-information found on the Santander dataset while evaluating on the Santander dataset?).
openml | ||
boto3==1.26.98 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tried it yet, but it looks like the exec file does not depend on these dependencies. What are they for?
@@ -0,0 +1,3 @@ | |||
sapientml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please install the framework through the setup.sh script. It allows people to specify versions, source, and so on.
@@ -0,0 +1,8 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the script so you can install both from source (as latest
) and from pypi (as stable
or with a specified version). See for example GAMA's script https://github.com/openml/automlbenchmark/blob/master/frameworks/GAMA/setup.sh
@@ -102,7 +102,7 @@ openml: # configuration namespace for openML. | |||
|
|||
versions: # configuration namespace for versions enforcement (libraries versions are usually enforced in requirements.txt for the app and for each framework). | |||
pip: | |||
python: 3.9 # the Python minor version that will be used by the application in containers and cloud instances, also used as a based version for virtual environments created for each framework. | |||
python: 3.11 # the Python minor version that will be used by the application in containers and cloud instances, also used as a based version for virtual environments created for each framework. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the framework not 3.9 compatible? Changing this number here will affect all frameworks. While we will raise this over time (and also plan to allow framework-specific definitions for this), we can't currently bump this without ensuring the compatibility for all other frameworks.
SapientML is an AutoML technology that can learn from a corpus of existing datasets and their human-written pipelines, and efficiently generate a high-quality pipeline for a predictive task on a new dataset.