Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weighted normalized gini #442

Open
SimonCoulombe opened this issue Aug 8, 2023 · 1 comment
Open

weighted normalized gini #442

SimonCoulombe opened this issue Aug 8, 2023 · 1 comment

Comments

@SimonCoulombe
Copy link

SimonCoulombe commented Aug 8, 2023

It would be awesome to add weighted normalized gini to the set of available metrics for regression. This is useful in insurance when we want to evaluate to performance of a loss cost model. We order the predictions by the predicted "annualized loss cost", but weigh them by the exposure (time the policy actually lasted) to get the actual dollar amount.

It is discussed (with code) in the following kaggle on fire peril loss cost https://www.kaggle.com/c/liberty-mutual-fire-peril/discussion/9880

Here is some code I use outside tidymodels. It is inspired by the function posted by pimin the kaggle thread. I think he had inverted the sign in the weighted gini, which meant a perfect prediction would get a gini of -0.999 instead of 0.999.

the formula is derived from this 2015 blog post: http://blog.nguyenvq.com/blog/2015/09/25/calculate-the-weighted-gini-coefficient-or-auc-in-r/


#' Title
#'
#' @param actual #  actual  loss cost
#' @param predicted ## predicted loss cost
#' @param weights ## earned exposure
#'
#' @return
#' @export
#'
#' @examples
weighted_gini <- function(actual, predicted, weights) {
  df <- data.frame(actual, weights, predicted)
  n <- nrow(df)
  df <- df[order(df$predicted, decreasing = TRUE), ]
  df$cum_weight <- cumsum(df$weights / sum(df$weights))
  df$cum_pos_found <- cumsum(df$actual * df$weights) 
  df$Lorentz <- df$cum_pos_found / df$cum_pos_found[n]
  sum(df$Lorentz[-n] * df$cum_weight[-1]) - sum(df$Lorentz[-1] * df$cum_weight[-n])
}

#' Title
#'
#' @param actual # actual loss cost
#' @param predicted # predicted loss cost
#' @param weights # earned exposure
#'
#' @return
#' @export
#'
#' @examples
normalized_weighted_gini <- function(actual, predicted, weights) {
  weighted_gini(actual, predicted, weights) / weighted_gini(actual, actual, weights)
}
@simonpcouch
Copy link
Contributor

Related to #147.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants