Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a new model #1475

Open
takgto opened this issue Jun 10, 2024 · 7 comments
Open

Support a new model #1475

takgto opened this issue Jun 10, 2024 · 7 comments

Comments

@takgto
Copy link

takgto commented Jun 10, 2024

Do you have a plan to support JetMoE model (https://github.com/myshell-ai/JetMoE) that very effective to reduce computational cost in inference in litgpt?

@rasbt
Copy link
Collaborator

rasbt commented Jun 10, 2024

Hi there,
thanks for suggesting! New models are always welcome. JetMoE is currently not on the priority list due to many other requests and features to be added, but if you want to contribute it, that'd be welcome!

@rasbt
Copy link
Collaborator

rasbt commented Jun 13, 2024

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

@takgto
Copy link
Author

takgto commented Jun 13, 2024

I added a doc describing how to add a new model to LitGPT in case this comes in handy: https://github.com/Lightning-AI/litgpt/blob/main/tutorials/developer-docs/adding-models.md

Thanks so much for your information. It is really valuable for me.
Currently, I have a difficulty in updating the checkpoint conversion script (convert_hf_checkpoint.py) for the new model (jetmoe/jetmoe-8b). I think It needs another weight_map in the script. However, I don't find out some keys of the new model as follows.
weight_map = {
"model.embed_tokens.weight": "transformer.wte.weight",
"model.layers.{}.mlp.output_linear.weight": ?, # ? mark means unknown key
"model.layers.{}.mlp.router.layer.weight": ?,
"model.layers.{}.input_layernorm.weight":"transformer.h.{}.norm_1.weight",
"model.layers.{}.mlp.bias": ?,
"model.layers.{}.mlp.input_linear.weight": ?,
"model.layers.{}.post_attention_layernorm.weight":"transformer.h.{}.norm_2.weight",
"model.layers.{}.self_attention.experts.bias": ? ,
"model.layers.{}.self_attention.experts.input_linear.weight": ? ,
"model.layers.{}.self_attention.experts.output_linear.weight": ? ,
"model.layers.{}.self_attention.experts.router.layer.weight":"transformer.h.{}.attn.experts.out_proj.weight",
"model.layers.{}.self_attention.kv_proj.weight": ? ,
"model.norm.weight": "transformer.ln_f.weight",
"model.layers.{}.self_attention.q_proj.weight":"transformer.h.{}.attn.q_proj.weight",
"model.layers.{}.self_attention.k_proj.weight":"transformer.h.{}.attn.k_proj.weight",
"model.layers.{}.self_attention.v_proj.weight":"transformer.h.{}.attn.v_proj.weight",
}
Do you know any tools or documentations to find out those unknown keys?

@rasbt
Copy link
Collaborator

rasbt commented Jun 13, 2024

That's a good question and usually the tricky part. It can be pretty hard to find the corresponding layer some times due to naming convention and sometimes because it may not be supported yet. I think in this case the LlamaMoE might be a good template to look at:

if config.mlp_class_name == "LLaMAMoE":

@rasbt
Copy link
Collaborator

rasbt commented Jun 13, 2024

I haven't read the JetMoE paper, do they also have different attention experts? In this case, this would not be supported yet. The LlamaMoE is only for the MLP layers as in the Mixtral.

@takgto
Copy link
Author

takgto commented Jun 14, 2024

Thank you for your continued support.
According to the technical website of jetmoe ( https://research.myshell.ai/jetmoe ), jetmoe has two MoE layers: Mixture of Attention heads (MoA) and Mixture of MLP exports (MoE) looks like ModuleFormer ( https://arxiv.org/abs/2306.04640 ). So, LlamaMoE model might not be fit to jetmoe.
Separately, I am asking the jetmoe website to provide parameter mapping information ( myshell-ai/JetMoE#11 ). Unfortunately, I haven't received a reply yet.

@rasbt
Copy link
Collaborator

rasbt commented Jun 14, 2024

Oh I see, the Mixture of Attention heads (MoA) part will be a bit tricky then, that's currently not supported by LitGPT and would have to be coded. It might be a bit tricky for a contribution like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants