strategy or example for doing a stratified k-fold #950

lazarusA · 2024-01-05T13:22:53Z

I see that the function partition can take as argument stratefy and that https://juliaml.github.io/MLUtils.jl/dev/#Examples can do k-folds..., is there out there a workflow for combining both?

The text was updated successfully, but these errors were encountered:

ablaom · 2024-01-05T21:33:09Z

Well, MLJ's evaluate and TunedModel have the option resampling=StratifiedCV().

So for example:

using MLJ
import Imbalance
using Random

# generate unbalanced synthetic data:
class_probs = [0.1, 0.9]
num_rows, num_features = 1000, 3
X, y = Imbalance.generate_imbalanced_data(
    num_rows,
    num_features;
    class_probs,
    rng=42
)

# stratified split into train/test:
rng = Random.Xoshiro(123)
(Xtrain, Xtest), (ytrain, ytest) = partition((X, y), 0.6; stratify=y, multi=true, rng)

# instantiate a random forest:
RandomForestClassifier = @iload RandomForestClassifier  pkg=DecisionTree
forest = RandomForestClassifier()

# evaluation of default forest on training set using stratified cv:
evaluate(forest, Xtrain, ytrain; resampling=StratifiedCV(; nfolds=6, rng), measure=log_loss)

# PerformanceEvaluation object with these fields:
#   model, measure, operation, measurement, per_fold,
#   per_observation, fitted_params_per_fold,
#   report_per_fold, train_test_rows, resampling, repeats
# Extract:
# ┌──────────────────────┬───────────┬─────────────┬─────────┬─────────────────────────────────────────────────────────┐
# │ measure              │ operation │ measurement │ 1.96*SE │ per_fold                                                │
# ├──────────────────────┼───────────┼─────────────┼─────────┼─────────────────────────────────────────────────────────┤
# │ LogLoss(             │ predict   │ 0.00199     │ 0.00135 │ [0.00213, 0.00478, 0.00174, 0.000302, 0.00091, 0.00206] │
# │   tol = 2.22045e-16) │           │             │         │                                                         │
# └──────────────────────┴───────────┴─────────────┴─────────┴─────────────────────────────────────────────────────────┘

# tune the model using the training set and stratified-cv:
r = range(forest, :n_subfeatures, lower=1, upper=3)
tuned_forest = TunedModel(
    forest;
    tuning=Grid(),
    range = r,
    resampling=StratifiedCV(; nfolds=6, rng),
    measure = log_loss,
    )
mach = machine(tuned_forest, Xtrain, ytrain) |> fit!

# evaluate the optimal model on the test set, using stratified-cv
best_forest = report(mach).best_model
evaluate(best_forest, Xtest, ytest;
         resampling=StratifiedCV(; nfolds=6, rng), measure=log_loss)

# PerformanceEvaluation object with these fields:
#   model, measure, operation, measurement, per_fold,
#   per_observation, fitted_params_per_fold,
#   report_per_fold, train_test_rows, resampling, repeats
# Extract:
# ┌──────────────────────┬───────────┬─────────────┬─────────┬──────────────────────────────────────────────────────────────┐
# │ measure              │ operation │ measurement │ 1.96*SE │ per_fold                                                     │
# ├──────────────────────┼───────────┼─────────────┼─────────┼──────────────────────────────────────────────────────────────┤
# │ LogLoss(             │ predict   │ 2.22e-16    │ 0.0     │ [2.22e-16, 2.22e-16, 2.22e-16, 2.22e-16, 2.22e-16, 2.22e-16] │
# │   tol = 2.22045e-16) │           │             │         │                                                              │
# └──────────────────────┴───────────┴─────────────┴─────────┴──────────────────────────────────────────────────────────────┘

Does this answer your question?

ablaom · 2024-01-17T03:17:48Z

Closing as no response. Feel free to re-open

ablaom closed this as completed Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strategy or example for doing a stratified k-fold #950

strategy or example for doing a stratified k-fold #950

lazarusA commented Jan 5, 2024

ablaom commented Jan 5, 2024 •

edited

Loading

ablaom commented Jan 17, 2024

strategy or example for doing a stratified k-fold #950

strategy or example for doing a stratified k-fold #950

Comments

lazarusA commented Jan 5, 2024

ablaom commented Jan 5, 2024 • edited Loading

ablaom commented Jan 17, 2024

ablaom commented Jan 5, 2024 •

edited

Loading