Fix PPO log_ratio bug #509

TobiasNorlund · 2023-06-21T09:51:17Z

Relevant issue: #508

Set position_ids when computing logprobs, both in make_experience and in loss to ensure same absolute positional embeddings are used in the two methods.
I think the mask applied when computing logratio should be shifted by one, to correctly mask the last token in the batch.

Note: Remember to remove the debug print statements before merge.

TobiasNorlund · 2023-06-21T09:59:26Z

Note that this PR only sets position_ids and shifts the mask for non seq2seq models. I have only tried this on a gpt2 model, and is not sure whether this bug also applies to seq2seq models.

maxreciprocate

Thanks Tobias! This is an extremely valuable find

maxreciprocate · 2023-06-21T14:50:20Z

trlx/trainer/accelerate_ppo_trainer.py

@@ -414,6 +435,7 @@ def make_experience(self, num_rollouts: int = 1024, iter_count: int = 0):  # noq
                        ref_logits = self.ref_model(
                            all_tokens,
                            attention_mask=attention_mask,
+                            position_ids=position_ids,


Duplicate this change into the self.model.forward_hydra call as well, otherwise log_ratio computed inside make_experience isn't equal to zero initially

Should it also be duplicated to the if self.config.model.model_arch_type == "seq2seq" branches?

Yes please!

Actually, I realize the forward methods for seq2seq models lack the position_ids argument, and at least T5 uses relative positional biases AFAIK, not absolute, in which case this should not be a problem for that model at least. I'm not sure whether there are other seq2seq models with absolute pos embeddings that TRLX support?

You're correct, sorry for the confusion, there aren't any currently except T5

trlx/trainer/accelerate_ppo_trainer.py

maxreciprocate

https://wandb.ai/sorry/trlx-references/reports/fix-logratio-bug-v-main--Vmlldzo0NzE2NzYw

Thanks again, Tobias
LGTM!

TobiasNorlund added 2 commits June 21, 2023 10:55

Print log_ratio

fa1b45e

Set position_ids and shift mask by one

46b9141

TobiasNorlund mentioned this pull request Jun 21, 2023

sanity check: PPO log_ratio should be zero when training is disabled #508

Closed

maxreciprocate self-requested a review June 21, 2023 11:54

maxreciprocate requested changes Jun 21, 2023

View reviewed changes

TobiasNorlund added 2 commits June 22, 2023 09:33

Modified position_ids computation + pass position_ids to forward_hydra

bb82e0e

Shift mask for seq2seq models + remove debug print statements

a451c94

maxreciprocate approved these changes Jun 22, 2023

View reviewed changes

maxreciprocate merged commit fbc9e04 into CarperAI:main Jun 23, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PPO log_ratio bug #509

Fix PPO log_ratio bug #509

TobiasNorlund commented Jun 21, 2023

TobiasNorlund commented Jun 21, 2023

maxreciprocate left a comment

maxreciprocate Jun 21, 2023

TobiasNorlund Jun 22, 2023

maxreciprocate Jun 22, 2023

TobiasNorlund Jun 22, 2023

maxreciprocate Jun 22, 2023

maxreciprocate left a comment

Fix PPO log_ratio bug #509

Fix PPO log_ratio bug #509

Conversation

TobiasNorlund commented Jun 21, 2023

TobiasNorlund commented Jun 21, 2023

maxreciprocate left a comment

Choose a reason for hiding this comment

maxreciprocate Jun 21, 2023

Choose a reason for hiding this comment

TobiasNorlund Jun 22, 2023

Choose a reason for hiding this comment

maxreciprocate Jun 22, 2023

Choose a reason for hiding this comment

TobiasNorlund Jun 22, 2023

Choose a reason for hiding this comment

maxreciprocate Jun 22, 2023

Choose a reason for hiding this comment

maxreciprocate left a comment

Choose a reason for hiding this comment