Probabilistic Forecasting #3200

onacrame · 2020-07-01T10:01:06Z

We know that LightGBM currently supports quantile regression, which is great, However, quantile regression can be an inefficient way to gauge prediction uncertainty because a new model needs to be built for every quantile, and in theory each of those models may have their own set of optimal hyperparameters, which becomes unwieldy from a production standpoint if you're interested in multiple quantiles as you can end up with many models. One of the main issues with machine learning is point predictions, as businesses are often interested to know what the probability distribution is for a given prediction. There are various methods to do this with neural networks, and only recently there have been new ways to address this with tree based models.

The NGBoost library has attempted to do this as per below

https://stanfordmlgroup.github.io/projects/ngboost/

Additionally there has been a paper on adapting XGBoost to be able to do this (in a different manner) although the author has not yet posted an implementation.

https://github.com/StatMixedML/XGBoostLSS

Something to consider as a feature as it would make LightGBM infinitely more valuable in regression scenarios.

CanML · 2020-07-03T19:53:33Z

Seconded - I think this would make LightGBM incredibly more useful for the same regression problems it is used to tackle now, as well as additional problems that require a more probabilistic approach.

Some of the ngboost team's ideas for next steps, on predicting joint probability distributions, as they mention in their slides ( https://drive.google.com/file/d/183BWFAdFms81MKy6hSku8qI97OwS_JH_/view ), are particularly interesting as well:

Demonstrate use for joint-outcomes regression (e.g. ”what’s the probability that it rains >3 inches and is >15C tomorrow?”)

StatMixedML · 2020-07-10T11:53:25Z

Thanks @MotoRZR for referring to my repo https://github.com/StatMixedML. In fact, I am currently also working on an extension of LightGBM to probabilistic forecasting, see the repo here https://github.com/StatMixedML/LightGBMLSS

kmedved · 2020-08-13T19:50:37Z

This would be a wonderful addition. FWIW - Catboost has recently rolled out support for something like this as well, in version 0.24 via RMSEWithUncertainty. I don't know how they implemented it yet.

StrikerRUS · 2020-08-13T20:00:18Z

FWIW - Catboost has recently rolled out support for something like this as well, in version 0.24 via RMSEWithUncertainty. I don't know how they implemented it yet.

Thanks to GitHub we can find the corresponding commit: catboost/catboost@af88523.

StatMixedML · 2020-08-15T09:00:56Z

@kmedved Thanks for pointing towards the RMSEWithUncertainty! Very interesting, even though I am not sure how it is implemented exactly.

The fact that the RMSE is used as a loss function makes me doubt that it is truly a probabilistic approach. The reason is that, as splitting procedures that are internally used to construct trees can detect changes in the mean only, standard implementations of machine learning models are not able to recognize any distributional changes (e.g., change of
variance), even if these can be related to covariates. Since RMSE is a loss function that is minimal if the estimator is the mean, I am not sure how it deals with changes in variance, skewness etc.

kmedved · 2020-08-15T13:11:33Z

That's a good note @StatMixedML, although I am actually not totally sure they're using plain RMSE as a loss function (despite the name). An explainer notebook is coming, but if you try it out, the validation loss on the model does not match (or even resemble RMSE). I've put together an example Colab notebook here., where on the CA housing dataset, without any tuning, the RMSE loss is 0.5166607793912465, while the RMSEWithUncertainty loss is 0.06206563547371406. So they seem to be using some custom scoring for RMSEWithUncertainty loss.

StatMixedML · 2020-08-15T16:35:24Z

@kmedved Thanks for the interesting comparison! Indeed, it seems as if RMSEWithUncertainty != RMSE.

Anyways, I am not sure how one would evaluate RMSEWithUncertainty within a probabilistic framework; say how well the forecast uncertainty is calibrated, using something like a scoringRule. I agree that mostly point forecasts are used, but having a good estimation of the uncertainty is at least as, if not even more important, than point estimates. I can't remember where if found the quote, but it is a nice one

It’s better to be approximately right than exactly wrong.

The first half, of course, relates to a probabilistic forecast, whereas the second half aims at point forecasts.

julioasotodv · 2020-08-18T09:48:25Z

Keep in mind that LightGBM already includes Quantile Regression. Even though it may not enjoy the probabilistic properties of a true probabilistic (let alone bayesian) forecast, it is still the most used method nowadays for variance forecast estimation; at least for aleatoric uncertainty.

onacrame · 2020-08-18T10:00:52Z

Keep in mind that LightGBM already includes Quantile Regression. Even though it may not enjoy the probabilistic properties of a true bayesian forecast, it is still the most used method nowadays for variance forecast estimation.

Quantile regression is fine if you're only interested in specific quantiles. If you want to the full distribution it's not as useful. Also, with quantile regression it's inefficient to have a model for each quantile with different sets of hyperparameters.

With neural nets there are various ways to do this like:
-variational inference i.e., have two targets for the network, the mean and standard deviation
-bayesian regression
-dropout based approach http://mlg.eng.cam.ac.uk/yarin/PDFs/NIPS_2015_deep_learning_uncertainty.pdf

With GBDT not as many tools, only recently has NGBoost has come on the scene. Seems like StatMixedML also has something in the works. Quantile regression is ok, but not the magic bullet solution for the reasons mentioned.

onacrame · 2020-08-18T10:05:07Z

This article also points out some of the flaws of quantile regression. Not specifically LightGBM related but still relevant.

https://medium.com/@qucit/a-simple-technique-to-estimate-prediction-intervals-for-any-regression-model-2dd73f630bcb

julioasotodv · 2020-08-18T10:17:31Z

@MotoRZR yes, I recall reading that article a while ago.

IMO, the most interesting approach is the one where the parameters of a distribution are estimated (this is, the first one you mentioned two messages above). In fact, that distribution parameter estimation method is what Amazon uses in their DeepAR paper (which happens to be the default model in their AWS Forecaster service).

However, I am not sure whether this should be added as a new objective. It is relatively easy to get it up and running with the existing API.

MC dropout for boosted trees is something I have been thinking about. However, at least in neural nets MC dropout generates distributions that usually end up having low variance (and therefore narrow prediction intervals) compared to more conventional bayesian inference fitting methods. But perhaps it is worth exploring for GBDT...

StrikerRUS · 2021-01-12T21:28:48Z

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

StatMixedML · 2022-01-04T17:14:23Z

I have just released LightGBMLSS, which is an extension of LightGBM to probabilistic forecasting. It has a very similar functionality as XGBoostLSS which is also under current development.

I hope this is helpful for bringing LightGBM to a probabilistic setting. Looking forward to your feedback!

onacrame · 2022-02-03T10:38:09Z

I have just released LightGBMLSS, which is an extension of LightGBM to probabilistic forecasting. It has a very similar functionality as XGBoostLSS which is also under current development.

I hope this is helpful for bringing LightGBM to a probabilistic setting. Looking forward to your feedback!

Can this be integrated into the main lightgbm package? Looks rather interesting.

StatMixedML · 2022-02-06T09:49:41Z

@onacrame Thanks for your interest in LightGBMLSS. In principle, it is possible to create a Pull-Request and integrate it into lightgbm itself. As of now, I am not planning to do this since there are still some additions I want to bring to LightGBMLSS.

lockmatrix · 2022-06-14T03:53:30Z

I found this very interesting:
https://www.semanticscholar.org/paper/Probabilistic-Gradient-Boosting-Machines-for-Sprangers-Schelter/8ec78ba7eb40f5a834753c94e245517d855ea3e0
https://github.com/elephaint/pgbm

StatMixedML · 2022-06-14T19:02:54Z

In our latest paper, we extend LightGBM to a probabilistic setting using Normalizing Flows. Hence, instead of assuming a parametric distribution, we approximate the conditional cumulative distribution function via a set of transformation, i.e., Normalizing Flows. You can find the paper on the repo

https://github.com/StatMixedML/DGBM

Yet, we are still struggling with the runtime. I have create an issue on the repo

StatMixedML/DGBM#1

That also has a link to a more thorough description of where the computational bottleneck might be.

Appreciate any support on this.

github-actions · 2023-08-15T20:35:10Z

This issue has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

jameslamb · 2023-08-15T20:36:40Z

This was locked in error, sorry.

Fish-Soup · 2024-09-16T11:58:35Z

Hi

Do we know if there is any possibility of including LightGBMLSS, or NGBoost like functionality within LightGBM?

This would be a very powerful addition to what we have currently in lightGBM.

StrikerRUS mentioned this issue Jan 12, 2021

Feature Requests & Voting Hub #2302

Open

StrikerRUS added feature request help wanted labels Jan 12, 2021

StrikerRUS closed this as completed Jan 12, 2021

github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023

microsoft unlocked this conversation Aug 15, 2023

This was referenced Aug 15, 2023

[ci] add bot to lock inactive issues and PRs #6037

Merged

[ci] prevent lock-threads from locking issues with label 'feature request' #6047

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probabilistic Forecasting #3200

Probabilistic Forecasting #3200

onacrame commented Jul 1, 2020 •

edited

Loading

CanML commented Jul 3, 2020

StatMixedML commented Jul 10, 2020

kmedved commented Aug 13, 2020

StrikerRUS commented Aug 13, 2020

StatMixedML commented Aug 15, 2020

kmedved commented Aug 15, 2020

StatMixedML commented Aug 15, 2020

julioasotodv commented Aug 18, 2020 •

edited

Loading

onacrame commented Aug 18, 2020

onacrame commented Aug 18, 2020

julioasotodv commented Aug 18, 2020

StrikerRUS commented Jan 12, 2021

StatMixedML commented Jan 4, 2022 •

edited

Loading

onacrame commented Feb 3, 2022

StatMixedML commented Feb 6, 2022

lockmatrix commented Jun 14, 2022

StatMixedML commented Jun 14, 2022

github-actions bot commented Aug 15, 2023 •

edited by jameslamb

Loading

jameslamb commented Aug 15, 2023

Fish-Soup commented Sep 16, 2024

Probabilistic Forecasting #3200

Probabilistic Forecasting #3200

Comments

onacrame commented Jul 1, 2020 • edited Loading

CanML commented Jul 3, 2020

StatMixedML commented Jul 10, 2020

kmedved commented Aug 13, 2020

StrikerRUS commented Aug 13, 2020

StatMixedML commented Aug 15, 2020

kmedved commented Aug 15, 2020

StatMixedML commented Aug 15, 2020

julioasotodv commented Aug 18, 2020 • edited Loading

onacrame commented Aug 18, 2020

onacrame commented Aug 18, 2020

julioasotodv commented Aug 18, 2020

StrikerRUS commented Jan 12, 2021

StatMixedML commented Jan 4, 2022 • edited Loading

onacrame commented Feb 3, 2022

StatMixedML commented Feb 6, 2022

lockmatrix commented Jun 14, 2022

StatMixedML commented Jun 14, 2022

github-actions bot commented Aug 15, 2023 • edited by jameslamb Loading

jameslamb commented Aug 15, 2023

Fish-Soup commented Sep 16, 2024

onacrame commented Jul 1, 2020 •

edited

Loading

julioasotodv commented Aug 18, 2020 •

edited

Loading

StatMixedML commented Jan 4, 2022 •

edited

Loading

github-actions bot commented Aug 15, 2023 •

edited by jameslamb

Loading