Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PPO log_ratio bug #509

Merged
merged 4 commits into from
Jun 23, 2023
Merged

Conversation

TobiasNorlund
Copy link
Contributor

Relevant issue: #508

  • Set position_ids when computing logprobs, both in make_experience and in loss to ensure same absolute positional embeddings are used in the two methods.
  • I think the mask applied when computing logratio should be shifted by one, to correctly mask the last token in the batch.

Note: Remember to remove the debug print statements before merge.

@TobiasNorlund
Copy link
Contributor Author

Note that this PR only sets position_ids and shifts the mask for non seq2seq models. I have only tried this on a gpt2 model, and is not sure whether this bug also applies to seq2seq models.

@maxreciprocate maxreciprocate self-requested a review June 21, 2023 11:54
Copy link
Collaborator

@maxreciprocate maxreciprocate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tobias! This is an extremely valuable find

@@ -414,6 +435,7 @@ def make_experience(self, num_rollouts: int = 1024, iter_count: int = 0): # noq
ref_logits = self.ref_model(
all_tokens,
attention_mask=attention_mask,
position_ids=position_ids,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate this change into the self.model.forward_hydra call as well, otherwise log_ratio computed inside make_experience isn't equal to zero initially

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it also be duplicated to the if self.config.model.model_arch_type == "seq2seq" branches?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I realize the forward methods for seq2seq models lack the position_ids argument, and at least T5 uses relative positional biases AFAIK, not absolute, in which case this should not be a problem for that model at least. I'm not sure whether there are other seq2seq models with absolute pos embeddings that TRLX support?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, sorry for the confusion, there aren't any currently except T5

trlx/trainer/accelerate_ppo_trainer.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@maxreciprocate maxreciprocate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxreciprocate maxreciprocate merged commit fbc9e04 into CarperAI:main Jun 23, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants