Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Classification vs. class probability metrics #237

Closed
mattwarkentin opened this issue Oct 15, 2021 · 4 comments
Closed

Discussion: Classification vs. class probability metrics #237

mattwarkentin opened this issue Oct 15, 2021 · 4 comments

Comments

@mattwarkentin
Copy link
Contributor

mattwarkentin commented Oct 15, 2021

Hi,

I love the yardstick package but I always find myself trying to use both classification and class probability metrics in a single metric set for workflows or workflowsets, and always run into the issue that they don't accept the same type of input for estimate. I always feel like it should "just work". This is probably because I regularly use torchmetrics, which allows you to use any of the classification metrics for the same inputs. The thresholding to form discrete levels is handled by the metric function itself, so it makes these classification functions easy to work with because they all take the same inputs, the difference is just how the computations go on behind the scenes. I like this approach because classification metrics are sort of a special case of class probability metrics where some thresholding has been done to the estimate. A default threshold of 0.5 is a sensible default, I think.

Is there any room in the yardstick world to either modify the classification metrics to be more general in that they can optionally accept the same estimate type as the class probability metrics, and add a threshold argument? To avoid issues with backwards compatibility, perhaps the estimate argument for classification metrics could accept either a factor (the current behaviour), or numeric/probability which gets thresholded and factored as processing step.

For a simple use case, when you have a binary outcome and fit a model that spits out probabilities (e.g. logistic regression), I think it would be nice to be able to get roc_auc, pr_auc, sensitivity, specificity, etc. in a metric set with less friction.

torchmetrics, for reference (using specificity as an example): https://torchmetrics.readthedocs.io/en/latest/references/functional.html#specificity-func

@mattwarkentin
Copy link
Contributor Author

mattwarkentin commented Nov 7, 2021

Upon some additional thought, I wonder if there should just be a wrapper function that simply turns any classification metric into one that can handle class probabilities with thresholding. The probably::threshold_perf() is on the right track and the proposed function would be along the same line as metric_tweak(), but with a different purpose. A function factory should do the trick:

library(tidymodels)
library(dplyr)

threshold_metric <- function(metric, threshold = 0.5) {
  new_metric <- function(data, truth, estimate, estimator = NULL, na_rm = TRUE,
                         event_level = yardstick_event_level(), ...) {
    data <- 
      data %>% 
      mutate(
        new_estimate = if_else(
          {{ estimate }} >= threshold, 
          levels({{ truth }})[[2]], 
          levels({{ truth }})[[1]]
        ),
        new_estimate = factor(new_estimate, levels = levels({{truth}}))
      )
    
    metric(data, {{truth}}, new_estimate, estimator, na_rm, event_level, ...)
  }

  class(new_metric) <- c('prob_metric', 'metric', 'function')
  return(new_metric)
}

data <- tibble(y = factor(sample(0:1, 100, TRUE)), y_hat = runif(100))

threshold_metric(sens)(data, y, y_hat, event_level = 'second')
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 sens    binary         0.519
threshold_metric(spec)(data, y, y_hat, event_level = 'second')
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 spec    binary           0.5
threshold_metric(recall)(data, y, y_hat, event_level = 'second')
#> # A tibble: 1 × 3
#>   .metric .estimator .estimate
#>   <chr>   <chr>          <dbl>
#> 1 recall  binary         0.519
threshold_metric(precision)(data, y, y_hat, event_level = 'second')
#> # A tibble: 1 × 3
#>   .metric   .estimator .estimate
#>   <chr>     <chr>          <dbl>
#> 1 precision binary         0.549

The function returned from threshold_metric() would need to have the right classes (i.e. prob_metric and metric, I think) so that it would play nicely with metric_set and would be recognized as a class probability metric by the tidymodels machinery, and there need to be some more safety checks built it (e.g. getting the levels in the right, extension for multi-class setting, etc.), but this would easily extend all of the class metrics to be made to function as probability metrics for use in workflows/workflowsets. What do you think?

@mattwarkentin mattwarkentin changed the title Discussion: Encapsulated metrics and classification vs. class probability metrics Discussion: Classification vs. class probability metrics Nov 7, 2021
@topepo
Copy link
Member

topepo commented Nov 10, 2021

We'd like to be able to optimize that threshold algorithmically so we plan on adding post-processing operations to workflows. We would also avoid tightly coupling the threshold specification from the metric calculation (so it can be used on predicted values).

I really want to get this feature in workflows but it is a little lower on the priority list (for now).

@mattwarkentin
Copy link
Contributor Author

Fair enough. I look forward to this being implemented into workflows.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Nov 25, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants