Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Phi 3.5 MoE #116

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

DePasqualeOrg
Copy link
Contributor

@DePasqualeOrg DePasqualeOrg commented Aug 31, 2024

This is my attempt to port the Phi 3.5 MoE model from the Python implementation. Unfortunately I can't test it myself, since my MacBook doesn't have enough RAM. I marked two places in PhiMoE.swift with comments starting with !! which need to be checked. You can test this with ModelConfiguration.phi3_5MoE. Go ahead and make any necessary changes if you'd like, since I won't be able to run this myself.

@davidkoski
Copy link
Collaborator

I will give it a try on Tuesday!

@davidkoski
Copy link
Collaborator

It looks like there is a problem loading the weights:

Error: Mismatched parameter weight shape. Actual [16, 6400, 512], expected [16, 6400, 4096]

this is on SwitchLinear (maybe this error needs some more context).

SwitchLinear(bias=nil, inputDims=4096, numExperts=16, outputDims=6400)

And actually the parameters has a biases which is not expected here:

(lldb) po parameters.mapValues { $0.shape }
▿ [
  biases: [16, 6400, 64],
  scales: [16, 6400, 64],
  weight: [16, 6400, 512]
]

I need to look into this further

@davidkoski
Copy link
Collaborator

davidkoski commented Sep 4, 2024

OK, I think SwitchLinear isn't quite right -- it is missing the bias (at the very least):

and I think the dimension mismatch comes down to quantization -- the SwitchLinear isn't being replaced by SwitchLinearQuantized because it doesn't implement the protocol:

and I think QuantizedSwitchLinear will need to be a subtype of SwitchLinear for the replacement to happen -- this @ModuleInfo(key: "gate_proj") var gateProj: SwitchLinear requires that the type be SwitchLinear or a subtype. Linear/QuantizedLinear are modeled the same way.

@DePasqualeOrg
Copy link
Contributor Author

Thanks for that feedback. I've tried to make those changes, although I don't know how helpful this will be, since I unfortunately can't test it myself. If it's too much trouble and you'd prefer to focus on other priorities, don't worry about it.

@davidkoski
Copy link
Collaborator

OK, I will see if I can test/finish this -- it may be a few days before I get a chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants