Discussion: Classification vs. class probability metrics #237

mattwarkentin · 2021-10-15T22:09:22Z

Hi,

I love the yardstick package but I always find myself trying to use both classification and class probability metrics in a single metric set for workflows or workflowsets, and always run into the issue that they don't accept the same type of input for estimate. I always feel like it should "just work". This is probably because I regularly use torchmetrics, which allows you to use any of the classification metrics for the same inputs. The thresholding to form discrete levels is handled by the metric function itself, so it makes these classification functions easy to work with because they all take the same inputs, the difference is just how the computations go on behind the scenes. I like this approach because classification metrics are sort of a special case of class probability metrics where some thresholding has been done to the estimate. A default threshold of 0.5 is a sensible default, I think.

Is there any room in the yardstick world to either modify the classification metrics to be more general in that they can optionally accept the same estimate type as the class probability metrics, and add a threshold argument? To avoid issues with backwards compatibility, perhaps the estimate argument for classification metrics could accept either a factor (the current behaviour), or numeric/probability which gets thresholded and factored as processing step.

For a simple use case, when you have a binary outcome and fit a model that spits out probabilities (e.g. logistic regression), I think it would be nice to be able to get roc_auc, pr_auc, sensitivity, specificity, etc. in a metric set with less friction.

torchmetrics, for reference (using specificity as an example): https://torchmetrics.readthedocs.io/en/latest/references/functional.html#specificity-func

The text was updated successfully, but these errors were encountered:

mattwarkentin · 2021-11-07T23:35:46Z

Upon some additional thought, I wonder if there should just be a wrapper function that simply turns any classification metric into one that can handle class probabilities with thresholding. The probably::threshold_perf() is on the right track and the proposed function would be along the same line as metric_tweak(), but with a different purpose. A function factory should do the trick:

library(tidymodels)
library(dplyr)

threshold_metric <- function(metric, threshold = 0.5) {
  new_metric <- function(data, truth, estimate, estimator = NULL, na_rm = TRUE,
                         event_level = yardstick_event_level(), ...) {
    data <- 
      data %>% 
      mutate(
        new_estimate = if_else(
          {{ estimate }} >= threshold, 
          levels({{ truth }})[[2]], 
          levels({{ truth }})[[1]]
        ),
        new_estimate = factor(new_estimate, levels = levels({{truth}}))
      )
    
    metric(data, {{truth}}, new_estimate, estimator, na_rm, event_level, ...)
  }

  class(new_metric) <- c('prob_metric', 'metric', 'function')
  return(new_metric)
}

data <- tibble(y = factor(sample(0:1, 100, TRUE)), y_hat = runif(100))

threshold_metric(sens)(data, y, y_hat, event_level = 'second')
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 sens    binary         0.519
threshold_metric(spec)(data, y, y_hat, event_level = 'second')
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 spec    binary           0.5
threshold_metric(recall)(data, y, y_hat, event_level = 'second')
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 recall  binary         0.519
threshold_metric(precision)(data, y, y_hat, event_level = 'second')
#> # A tibble: 1 × 3
#>   .metric   .estimator .estimate
#>   <chr>     <chr>          <dbl>
#> 1 precision binary         0.549

The function returned from threshold_metric() would need to have the right classes (i.e. prob_metric and metric, I think) so that it would play nicely with metric_set and would be recognized as a class probability metric by the tidymodels machinery, and there need to be some more safety checks built it (e.g. getting the levels in the right, extension for multi-class setting, etc.), but this would easily extend all of the class metrics to be made to function as probability metrics for use in workflows/workflowsets. What do you think?

topepo · 2021-11-10T15:37:19Z

We'd like to be able to optimize that threshold algorithmically so we plan on adding post-processing operations to workflows. We would also avoid tightly coupling the threshold specification from the metric calculation (so it can be used on predicted values).

I really want to get this feature in workflows but it is a little lower on the priority list (for now).

mattwarkentin · 2021-11-10T17:10:02Z

Fair enough. I look forward to this being implemented into workflows.

github-actions · 2021-11-25T00:48:16Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

mattwarkentin changed the title ~~Discussion: Encapsulated metrics and classification vs. class probability metrics~~ Discussion: Classification vs. class probability metrics Nov 7, 2021

mattwarkentin closed this as completed Nov 10, 2021

github-actions bot locked and limited conversation to collaborators Nov 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: Classification vs. class probability metrics #237

Discussion: Classification vs. class probability metrics #237

mattwarkentin commented Oct 15, 2021 •

edited

Loading

mattwarkentin commented Nov 7, 2021 •

edited

Loading

topepo commented Nov 10, 2021

mattwarkentin commented Nov 10, 2021

github-actions bot commented Nov 25, 2021

Discussion: Classification vs. class probability metrics #237

Discussion: Classification vs. class probability metrics #237

Comments

mattwarkentin commented Oct 15, 2021 • edited Loading

mattwarkentin commented Nov 7, 2021 • edited Loading

topepo commented Nov 10, 2021

mattwarkentin commented Nov 10, 2021

github-actions bot commented Nov 25, 2021

mattwarkentin commented Oct 15, 2021 •

edited

Loading

mattwarkentin commented Nov 7, 2021 •

edited

Loading