[BUG] TFTModel predicts nan values when MapeLoss function is used #2517

akepa · 2024-08-30T11:15:08Z

Describe the bug

When MapeLoss is used as loss function with a TFTModel (loss_fn parameter), the output of the training shows val_loss and train_loss = 0:

from darts.utils.losses import MapeLoss

model = TFTModel(
        ...
        loss_fn=MapeLoss(),
        ...
    )

Epoch 4: 100%
 1/1 [00:00<00:00, 11.02it/s, train_loss=0.000, val_loss=0.000]

Then, when we try to get some predictions with that model, prediction method returns an array of nan values:

array([[[nan]],

       [[nan]],

       [[nan]]]

There is no issue when any other loss function (e.g MSELoss) is used.

To Reproduce
It can be reproduced with the following code. Dataset is also attached: input_example.csv

import pandas as pd
import torch
from pytorch_lightning.callbacks import Callback, EarlyStopping
from darts import TimeSeries
from darts.models import TFTModel
from darts.utils.losses import MapeLoss
from torch.nn import MSELoss


# Retrieve target series
df = pd.read_csv('input_example.csv')
s = TimeSeries.from_dataframe(df, 'date', 'target')
test = s[-3:]
val = s[-18:-3]
train = s[:-18]

# Build and train the model
early_stopper = EarlyStopping("val_loss", min_delta=0.001, patience=10, verbose=True)
callbacks = [early_stopper]

model = TFTModel(
        input_chunk_length=12,
        output_chunk_length=3,
        batch_size=64,
        n_epochs=5,
        add_relative_index=True,
        add_encoders=None,
        loss_fn=MapeLoss(), # MapeLoss(),# MSELoss(),
        likelihood=None,
        random_state=42,
        pl_trainer_kwargs={"accelerator": "gpu", "devices": [0], "callbacks": callbacks},
        save_checkpoints=True, 
        model_name="my_model",
        force_reset=True
    )

model.fit(series=train,val_series=val,verbose=True)

best_model = model.load_from_checkpoint(model_name="my_model", best=True, work_dir='darts_logs')

best_model.predict(n=3, num_samples=1, series=train.append(val))

Expected behavior
Prediction output should be an array of float values, and not an array of nans.

System (please complete the following information):

Python version: 3.11.8
darts version 0.30.0

Additional context
I've tried to understand where the nan values are coming from. I've modified MapeLoss (https://github.com/unit8co/darts/blob/master/darts/utils/losses.py#L96) to print the values of the two parameters:

    def forward(self, inpt, tgt):
        print(f'TGT: {tgt}')
        print(f'INPT: {inpt}')
        return torch.mean(torch.abs(_divide_no_nan(tgt - inpt, tgt)))

It seems that from the second method call onwards, INPT parameter comes with an array of nan.

The text was updated successfully, but these errors were encountered:

dennisbader · 2024-08-30T11:38:19Z

Hi @akepa, you have a 0. in your data which is most like the issue here.

akepa · 2024-08-30T12:11:34Z

Thank you very much for the quick response. Indeed, the data was scaled with the default MinMaxScaler, and if I replace the value at 0 by a positive number, the problem disappears.

Is this case supposed to work? If not, should it be specified somewhere in the documentation?

madtoinou · 2024-09-11T08:10:10Z

According to the documentation, the NaN and inf are replaced by 0 when using MapeLoss; as soon as the model forecasts nan, the loss becomes equals to 0. This "zeroing" might also impact the back-propagation and cause some weights in the model to become nan, leading to nan predictions (to be confirmed).

It is expected that MAPE will not work with a dataset containing zeros (by definition), I don't think that adding a sentence in its docstring to remind the users to avoid using the MinMaxScaler in combination with this loss is relevant as the zeros can have various origins/causes.

akepa added bug Something isn't working triage Issue waiting for triaging labels Aug 30, 2024

madtoinou added question Further information is requested and removed bug Something isn't working triage Issue waiting for triaging labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] TFTModel predicts nan values when MapeLoss function is used #2517

[BUG] TFTModel predicts nan values when MapeLoss function is used #2517

akepa commented Aug 30, 2024

dennisbader commented Aug 30, 2024

akepa commented Aug 30, 2024

madtoinou commented Sep 11, 2024

[BUG] TFTModel predicts nan values when MapeLoss function is used #2517

[BUG] TFTModel predicts nan values when MapeLoss function is used #2517

Comments

akepa commented Aug 30, 2024

dennisbader commented Aug 30, 2024

akepa commented Aug 30, 2024

madtoinou commented Sep 11, 2024