Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging two mistral based models with different architectures. Looking for some guidance. #401

Open
AshD opened this issue Aug 19, 2024 · 1 comment

Comments

@AshD
Copy link

AshD commented Aug 19, 2024

I want to merge Mistral Large with https://huggingface.co/softwareweaver/Twilight-Miqu-146B by adding some layers from Twilight Miqu to Mistral Large using the passthrough method. Is there a better way to do this?

The merge succeeds when using --allow-crimes but the GGUF model fails to run and so does loading it with transformers
GGUF Runtime error:
RuntimeError: shape '[96, 2, 42, 8192]' is invalid for input of size 67108864

Transformers loading error:
size mismatch for model.layers.151.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 12288]) from checkpoint, the shape in current model is torch.Size([28672, 8192]

Merge config:

dtype: bfloat16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 43]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [5, 35]
    model: softwareweaver/Twilight-Miqu-146B
- sources:
  - layer_range: [80, 120]
    model: softwareweaver/Twilight-Miqu-146B
- sources:
  - layer_range: [44, 87]
    model: mistralai/Mistral-Large-Instruct-2407
@cg123
Copy link
Collaborator

cg123 commented Aug 22, 2024

This is an expected failure. Miqu and Mistral Large have different hidden state sizes so their layers can't be used interchangeably. In general models need to be of the same architecture and family to produce a valid result.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants