Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ordering of ppo epoch iteration #522

Merged
merged 6 commits into from
Jul 31, 2023
Merged

Commits on Jul 13, 2023

  1. Fix ordering of ppo epoch iteration

    Suppose you have 256 rollouts, and you batch them into batches B1, B2,
    B3, B4 (each of size 64). The order of gradient updates (assuming 3
    ppo_epochs) is:
    
    `trlx: B1 B1 B1 B2 B2 B2 B3 B3 B3 B4 B4 B4`
    
    However, what we should actually be doing (and what alpaca-farm and
    other rlhf implementations, and standard implementations of PPO do), is
    
    `improved: B1 B2 B3 B4 B1 B2 B3 B4 B1 B2 B3 B4`
    
    It would be even better if we actually produced new random batches at
    each ppo_epoch, that would require more refactoring. i.e.:
    
    `optimal: B1 B2 B3 B4 B1' B2' B3' B4' B1* B2* B3* B4*`
    
    This change basically just reorders the learning to make the code use
    the `improved` ordering above. It also renames n_updates_per_batch to
    n_inner_epochs as that's a more accurate description (especially now),
    adjusts forward_time and backward_time to not type-error, and renames
    mbs and mb to minibatch and microbatch (as that's what they are).
    RobertKirk committed Jul 13, 2023
    Configuration menu
    Copy the full SHA
    fe33681 View commit details
    Browse the repository at this point in the history

Commits on Jul 19, 2023

  1. Reset train dataloder at each iteration

    This way we get better shuffling. Note that we now pass shuffle=True
    (implicitly) in the ppo trainer, whereas before we had shuffle=False.
    Shuffling is better here, as it means the gradient estimation over
    minibatches is less correlated.
    RobertKirk committed Jul 19, 2023
    Configuration menu
    Copy the full SHA
    8a943d9 View commit details
    Browse the repository at this point in the history

Commits on Jul 22, 2023

  1. Configuration menu
    Copy the full SHA
    10369a1 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ea7c2b0 View commit details
    Browse the repository at this point in the history

Commits on Jul 24, 2023

  1. Configuration menu
    Copy the full SHA
    99de166 View commit details
    Browse the repository at this point in the history

Commits on Jul 25, 2023

  1. Configuration menu
    Copy the full SHA
    87aa331 View commit details
    Browse the repository at this point in the history