Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TFTModel predicts nan values when MapeLoss function is used #2517

Open
akepa opened this issue Aug 30, 2024 · 3 comments
Open

[BUG] TFTModel predicts nan values when MapeLoss function is used #2517

akepa opened this issue Aug 30, 2024 · 3 comments
Labels
question Further information is requested

Comments

@akepa
Copy link

akepa commented Aug 30, 2024

Describe the bug

When MapeLoss is used as loss function with a TFTModel (loss_fn parameter), the output of the training shows val_loss and train_loss = 0:

from darts.utils.losses import MapeLoss

model = TFTModel(
        ...
        loss_fn=MapeLoss(),
        ...
    )
Epoch 4: 100%
 1/1 [00:00<00:00, 11.02it/s, train_loss=0.000, val_loss=0.000]

Then, when we try to get some predictions with that model, prediction method returns an array of nan values:

array([[[nan]],

       [[nan]],

       [[nan]]]

There is no issue when any other loss function (e.g MSELoss) is used.

To Reproduce
It can be reproduced with the following code. Dataset is also attached: input_example.csv

import pandas as pd
import torch
from pytorch_lightning.callbacks import Callback, EarlyStopping
from darts import TimeSeries
from darts.models import TFTModel
from darts.utils.losses import MapeLoss
from torch.nn import MSELoss


# Retrieve target series
df = pd.read_csv('input_example.csv')
s = TimeSeries.from_dataframe(df, 'date', 'target')
test = s[-3:]
val = s[-18:-3]
train = s[:-18]

# Build and train the model
early_stopper = EarlyStopping("val_loss", min_delta=0.001, patience=10, verbose=True)
callbacks = [early_stopper]

model = TFTModel(
        input_chunk_length=12,
        output_chunk_length=3,
        batch_size=64,
        n_epochs=5,
        add_relative_index=True,
        add_encoders=None,
        loss_fn=MapeLoss(), # MapeLoss(),# MSELoss(),
        likelihood=None,
        random_state=42,
        pl_trainer_kwargs={"accelerator": "gpu", "devices": [0], "callbacks": callbacks},
        save_checkpoints=True, 
        model_name="my_model",
        force_reset=True
    )

model.fit(series=train,val_series=val,verbose=True)

best_model = model.load_from_checkpoint(model_name="my_model", best=True, work_dir='darts_logs')

best_model.predict(n=3, num_samples=1, series=train.append(val))

Expected behavior
Prediction output should be an array of float values, and not an array of nans.

System (please complete the following information):

  • Python version: 3.11.8
  • darts version 0.30.0

Additional context
I've tried to understand where the nan values are coming from. I've modified MapeLoss (https://github.com/unit8co/darts/blob/master/darts/utils/losses.py#L96) to print the values of the two parameters:

    def forward(self, inpt, tgt):
        print(f'TGT: {tgt}')
        print(f'INPT: {inpt}')
        return torch.mean(torch.abs(_divide_no_nan(tgt - inpt, tgt)))

It seems that from the second method call onwards, INPT parameter comes with an array of nan.

@akepa akepa added bug Something isn't working triage Issue waiting for triaging labels Aug 30, 2024
@dennisbader
Copy link
Collaborator

Hi @akepa, you have a 0. in your data which is most like the issue here.

@akepa
Copy link
Author

akepa commented Aug 30, 2024

Thank you very much for the quick response. Indeed, the data was scaled with the default MinMaxScaler, and if I replace the value at 0 by a positive number, the problem disappears.

Is this case supposed to work? If not, should it be specified somewhere in the documentation?

@madtoinou
Copy link
Collaborator

According to the documentation, the NaN and inf are replaced by 0 when using MapeLoss; as soon as the model forecasts nan, the loss becomes equals to 0. This "zeroing" might also impact the back-propagation and cause some weights in the model to become nan, leading to nan predictions (to be confirmed).

It is expected that MAPE will not work with a dataset containing zeros (by definition), I don't think that adding a sentence in its docstring to remind the users to avoid using the MinMaxScaler in combination with this loss is relevant as the zeros can have various origins/causes.

@madtoinou madtoinou added question Further information is requested and removed bug Something isn't working triage Issue waiting for triaging labels Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants