Merging two mistral based models with different architectures. Looking for some guidance. #401

AshD · 2024-08-19T18:35:01Z

I want to merge Mistral Large with https://huggingface.co/softwareweaver/Twilight-Miqu-146B by adding some layers from Twilight Miqu to Mistral Large using the passthrough method. Is there a better way to do this?

The merge succeeds when using --allow-crimes but the GGUF model fails to run and so does loading it with transformers
GGUF Runtime error:
RuntimeError: shape '[96, 2, 42, 8192]' is invalid for input of size 67108864

Transformers loading error:
size mismatch for model.layers.151.mlp.gate_proj.weight: copying a param with shape torch.Size([28672, 12288]) from checkpoint, the shape in current model is torch.Size([28672, 8192]

Merge config:

dtype: bfloat16
merge_method: passthrough
slices:
- sources:
  - layer_range: [0, 43]
    model: mistralai/Mistral-Large-Instruct-2407
- sources:
  - layer_range: [5, 35]
    model: softwareweaver/Twilight-Miqu-146B
- sources:
  - layer_range: [80, 120]
    model: softwareweaver/Twilight-Miqu-146B
- sources:
  - layer_range: [44, 87]
    model: mistralai/Mistral-Large-Instruct-2407

The text was updated successfully, but these errors were encountered:

cg123 · 2024-08-22T23:19:56Z

This is an expected failure. Miqu and Mistral Large have different hidden state sizes so their layers can't be used interchangeably. In general models need to be of the same architecture and family to produce a valid result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging two mistral based models with different architectures. Looking for some guidance. #401

Merging two mistral based models with different architectures. Looking for some guidance. #401

AshD commented Aug 19, 2024 •

edited

Loading

cg123 commented Aug 22, 2024

Merging two mistral based models with different architectures. Looking for some guidance. #401

Merging two mistral based models with different architectures. Looking for some guidance. #401

Comments

AshD commented Aug 19, 2024 • edited Loading

cg123 commented Aug 22, 2024

AshD commented Aug 19, 2024 •

edited

Loading