-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEER MoE #93
Conversation
mttl/models/expert_model.py
Outdated
@@ -587,6 +592,17 @@ def as_expert(self): | |||
"This method is not implemented for MultiExpertModel." | |||
) | |||
|
|||
def add_empty_experts(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this func should be in MoE model I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes true
mttl/models/containers/__init__.py
Outdated
expert, | ||
action=action, | ||
is_default=is_default, | ||
if len(expert_config.modify_layers) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if modify_layers = "", we do not look into children of the module, and directly modify module instead of layers.
This is useful when we want to directly replace the full MLP module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nitpicking) I would check if == ""
instead of len == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe checking if modify_layers == None
?
mttl/models/containers/__init__.py
Outdated
expert, | ||
action=action, | ||
is_default=is_default, | ||
if len(expert_config.modify_layers) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nitpicking) I would check if == ""
instead of len == 0
mttl/models/containers/__init__.py
Outdated
expert, | ||
action=action, | ||
is_default=is_default, | ||
if len(expert_config.modify_layers) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe checking if modify_layers == None
?
""" | ||
PEER layer from Mixture of A Million Experts (https://arxiv.org/pdf/2407.04153) | ||
|
||
Right not it assumes that it receives a module -- an MLP block, that has attributes fc1 and fc2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo (not / now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merci!
from mttl.models.modifiers.modify_model import get_modifier_name | ||
|
||
# diff architectures name those layers differently | ||
down_names = ["fc1", "c_fc"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make these arguments to be passed from the config ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
mttl/models/expert_model.py
Outdated
@@ -693,3 +680,22 @@ def training_step(self, batch, _): | |||
for i, pg in enumerate(self.optimizers().optimizer.param_groups): | |||
self.log(f"train/lr_{i}", pg["lr"]) | |||
return total_loss | |||
|
|||
def as_expert(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does as_expert
work for multiple experts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ups, this should not be here. Addressed
tests/test_peer.py
Outdated
from mttl.models.modifiers.lora import LoRA | ||
|
||
|
||
@pytest.fixture |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this fixture in another file already?
mttl/models/containers/__init__.py
Outdated
expert, | ||
action=action, | ||
is_default=is_default, | ||
if expert_config.modify_layers == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a q: do we really need to keep both modify_layers and modify_module here? isn't there something we can do to say:
modify_layers=X,
where X is all the layers/modules that match but not their children,
e.g. if X=k_proj, we modify 1.block.k_proj, then we also modify 2.block.k_proj, cause they have different parents
but if X=block, then 1.block.k_proj, and 1.block.q_proj is modified only once (we keep track of whether we modified the parent of the current layer)
doesn't this encompass modify_layers and modify_modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will remove "modify_layers" argument in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed
mttl/models/containers/__init__.py
Outdated
expert, | ||
action=action, | ||
is_default=is_default, | ||
if expert_config.modify_layers == "": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not modify_layers?
from mttl.models.modifiers.modify_model import get_modifier_name | ||
|
||
# diff architectures name those layers differently | ||
DOWN_NAMES = ["fc1", "c_fc"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this an argument that is part of the config ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed now
Implementation of the PEER MoE: https://arxiv.org/pdf/2407.04153.
This runs with 10K experts and a small GPT-neo/Phi2 model on 40G GPU.
This is the config: