Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add thresholds at which to evaluate the ROC curve. #488

Open
Dpananos opened this issue Jan 18, 2024 · 6 comments
Open

Add thresholds at which to evaluate the ROC curve. #488

Dpananos opened this issue Jan 18, 2024 · 6 comments
Labels
feature a feature request or enhancement

Comments

@Dpananos
Copy link

Feature

In some situations it might be preferable to pre-specify probability thresholds for the roc curve. Might it be worthwhile to add an argument to roc_curve for this?

@tripartio
Copy link

What would the output be? That is, what is an ROC curve with thresholds as an input supposed to look like? I am confused because I understand that the entire point of an ROC curve is to show results at all possible thresholds.

@Dpananos
Copy link
Author

Dpananos commented Jan 18, 2024

@tripartio Here is an example of what the output should look like

For each threshold, the sensitivity and specificity are calculated and one can plot the ROC curve.

Currently, the ROC curve is plotted for all unique values of the estimate. This is sensible, but in my workflow I want to be able to compare models at the same thresholds. This is hard to do when thresholds are determined by the estimate, not all models will return the same estimate levels.

library(tidyverse)

N <- 1000
y <- factor(rbinom(N, 1, 0.5))
p <- runif(N)

thresholds <- c(-Inf, ppoints(100), Inf)

rocc <-map_dfr(thresholds, ~{
  
  predicted <- factor(as.integer(p>.x), levels = c(0, 1))
  
  sensitivity <- yardstick::sens_vec(y, predicted)
  specificity <- yardstick::spec_vec(y, predicted)
  
  tibble(
    .threshold=.x, 
    sensitivity = sensitivity, 
    specificity = specificity
  )

})   


rocc %>% 
  ggplot(aes(1-specificity, sensitivity)) + 
  geom_line()

rocc
#> # A tibble: 102 × 3
#>    .threshold sensitivity specificity
#>         <dbl>       <dbl>       <dbl>
#>  1   -Inf         0             1    
#>  2      0.005     0.00389       0.994
#>  3      0.015     0.00973       0.990
#>  4      0.025     0.0195        0.971
#>  5      0.035     0.0272        0.955
#>  6      0.045     0.0350        0.940
#>  7      0.055     0.0447        0.926
#>  8      0.065     0.0584        0.918
#>  9      0.075     0.0661        0.901
#> 10      0.085     0.0720        0.893
#> # ℹ 92 more rows

Created on 2024-01-18 with reprex v2.0.2

@EmilHvitfeldt EmilHvitfeldt added the feature a feature request or enhancement label Jan 18, 2024
@EmilHvitfeldt
Copy link
Member

EmilHvitfeldt commented Jan 18, 2024

Hello @Dpananos 👋

this is not an unreasonable request! I could also imagine a scenario where you have many many unique values of estimate and selecting fewer for plotting is advantageous.

@Dpananos
Copy link
Author

Happy to take this on, though I might need some guidance on how best to approach the change

@jxu
Copy link

jxu commented May 14, 2024

For my test set of 55k observations, the generated ROC table has 9300 entries. This is way too much to plot as you can't see that much detail. My colleague who used sklearn (I think) gave me a much more reasonable 400 entries.

@jxu
Copy link

jxu commented Jun 20, 2024

binary_threshold_curve <- function(truth,

The code is kinda confusing but I guess binary thresholds function is only designed to operate on every unique point of truth/estimate, not a given set of thresholds. So it would require some rewriting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants