Add thresholds at which to evaluate the ROC curve. #488

Dpananos · 2024-01-18T21:30:12Z

Feature

In some situations it might be preferable to pre-specify probability thresholds for the roc curve. Might it be worthwhile to add an argument to roc_curve for this?

The text was updated successfully, but these errors were encountered:

tripartio · 2024-01-18T21:37:54Z

What would the output be? That is, what is an ROC curve with thresholds as an input supposed to look like? I am confused because I understand that the entire point of an ROC curve is to show results at all possible thresholds.

Dpananos · 2024-01-18T21:55:05Z

@tripartio Here is an example of what the output should look like

For each threshold, the sensitivity and specificity are calculated and one can plot the ROC curve.

Currently, the ROC curve is plotted for all unique values of the estimate. This is sensible, but in my workflow I want to be able to compare models at the same thresholds. This is hard to do when thresholds are determined by the estimate, not all models will return the same estimate levels.

library(tidyverse)

N <- 1000
y <- factor(rbinom(N, 1, 0.5))
p <- runif(N)

thresholds <- c(-Inf, ppoints(100), Inf)

rocc <-map_dfr(thresholds, ~{
  
  predicted <- factor(as.integer(p>.x), levels = c(0, 1))
  
  sensitivity <- yardstick::sens_vec(y, predicted)
  specificity <- yardstick::spec_vec(y, predicted)
  
  tibble(
    .threshold=.x, 
    sensitivity = sensitivity, 
    specificity = specificity
  )

})   


rocc %>% 
  ggplot(aes(1-specificity, sensitivity)) + 
  geom_line()

rocc
#> # A tibble: 102 × 3
#>    .threshold sensitivity specificity
#>         <dbl>       <dbl>       <dbl>
#>  1   -Inf         0             1    
#>  2      0.005     0.00389       0.994
#>  3      0.015     0.00973       0.990
#>  4      0.025     0.0195        0.971
#>  5      0.035     0.0272        0.955
#>  6      0.045     0.0350        0.940
#>  7      0.055     0.0447        0.926
#>  8      0.065     0.0584        0.918
#>  9      0.075     0.0661        0.901
#> 10      0.085     0.0720        0.893
#> # ℹ 92 more rows

^{Created on 2024-01-18 with reprex v2.0.2}

EmilHvitfeldt · 2024-01-18T22:36:38Z

Hello @Dpananos 👋

this is not an unreasonable request! I could also imagine a scenario where you have many many unique values of estimate and selecting fewer for plotting is advantageous.

Dpananos · 2024-01-18T23:16:54Z

Happy to take this on, though I might need some guidance on how best to approach the change

jxu · 2024-05-14T18:38:58Z

For my test set of 55k observations, the generated ROC table has 9300 entries. This is way too much to plot as you can't see that much detail. My colleague who used sklearn (I think) gave me a much more reasonable 400 entries.

jxu · 2024-06-20T15:22:38Z

yardstick/R/prob-binary-thresholds.R

Line 6 in be744a3

binary_threshold_curve <- function(truth,

The code is kinda confusing but I guess binary thresholds function is only designed to operate on every unique point of truth/estimate, not a given set of thresholds. So it would require some rewriting.

EmilHvitfeldt added the feature a feature request or enhancement label Jan 18, 2024

jxu mentioned this issue Jun 20, 2024

Metrics per threshold #512

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add thresholds at which to evaluate the ROC curve. #488

Add thresholds at which to evaluate the ROC curve. #488

Dpananos commented Jan 18, 2024

tripartio commented Jan 18, 2024

Dpananos commented Jan 18, 2024 •

edited

Loading

EmilHvitfeldt commented Jan 18, 2024 •

edited

Loading

Dpananos commented Jan 18, 2024

jxu commented May 14, 2024

jxu commented Jun 20, 2024

Add thresholds at which to evaluate the ROC curve. #488

Add thresholds at which to evaluate the ROC curve. #488

Comments

Dpananos commented Jan 18, 2024

Feature

tripartio commented Jan 18, 2024

Dpananos commented Jan 18, 2024 • edited Loading

EmilHvitfeldt commented Jan 18, 2024 • edited Loading

Dpananos commented Jan 18, 2024

jxu commented May 14, 2024

jxu commented Jun 20, 2024

Dpananos commented Jan 18, 2024 •

edited

Loading

EmilHvitfeldt commented Jan 18, 2024 •

edited

Loading