Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mergekit saves tied and ignored weights unlike what transformers does when saving #390

Open
nyxkrage opened this issue Aug 7, 2024 · 0 comments

Comments

@nyxkrage
Copy link
Contributor

nyxkrage commented Aug 7, 2024

An example of this in the wild is Gemma2, where saving a Gemma2 model with save_pretrained ignores the lm_head tensor, due to it being a tied weight, whereas a Gemma2 model saved from mergekit will have the lm_head included in the safetensors files, resulting in the model being 10.2B parameters vs 9.24B parameters. This can be seen with grimjim/Gemma2-Nephilim-v3-9B vs google/gemma-2-9b

Relevant code from transformers:
https://github.com/huggingface/transformers/blob/3d8bd11942cec26851c80c01aa5e8403542ca50b/src/transformers/modeling_utils.py#L2634-L2637
https://github.com/huggingface/transformers/blob/3d8bd11942cec26851c80c01aa5e8403542ca50b/src/transformers/modeling_utils.py#L2666-L2700

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant