Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter input rotary-freq #263

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jmercat
Copy link
Collaborator

@jmercat jmercat commented Apr 30, 2024

This allows to change the rotary positional embedding frequency parameter.
This is useful given the more recent approaches:
LLaMA 1&2 used 10000 which is the default value here.
LLaMA 3 uses 500000, Mistral uses 1000000
LLaMA long context extension works usually increase from 10000 to 100000.

@jmercat jmercat requested a review from pouransari April 30, 2024 02:35
@jmercat
Copy link
Collaborator Author

jmercat commented Apr 30, 2024

To test my changes I ran training with a small model and put a breakpoint where the parameter is used in rotary.py. I set a different value both in command line or in the json file and it was correctly passed on. Training was successful and tests passed. (except one in grad_accum and one in tokenize but not due to this PR, just access right issues).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant