PEER MoE #93

oleksost · 2024-08-16T19:06:06Z

Implementation of the PEER MoE: https://arxiv.org/pdf/2407.04153.

This runs with 10K experts and a small GPT-neo/Phi2 model on 40G GPU.

This is the config:

        {
            "name": "train_moe",
            "type": "python",
            "request": "launch",
            "program": "/train_moe.py",
            "args": [
                "-c", "modular_llm/configs/models/phi-2_moe_post-hoc.json",
                "-k",
                "output_dir=/tmp/mttl_out_tmp/",
                "include_task_source=*",
                "model_modifier=peer",
                "modify_modules=.*mlp",
                "model=EleutherAI/gpt-neo-125m",
                "modify_layers=",
                "trainable_param_names=.*mlp.*",
                "finetune_task_name=dream_read_the_following_conversation_and_answer_the_question", 
                "dataset=sordonia/flan-10k-flat",
                "moe_num_experts=10000",
                "train_batch_size=1",
                "subsample_dev=1",
                "top_k=2",
                "eval_before_training=False",
                "router_selector=moe_pk_router"
            ],
            "console": "integratedTerminal",
            "justMyCode": false
        },

sordonia · 2024-08-16T19:21:18Z

mttl/models/expert_model.py

@@ -587,6 +592,17 @@ def as_expert(self):
            "This method is not implemented for MultiExpertModel."
        )

+    def add_empty_experts(self):


this func should be in MoE model I think

sordonia · 2024-08-16T19:24:22Z

mttl/models/containers/__init__.py

-                        expert,
-                        action=action,
-                        is_default=is_default,
+            if len(expert_config.modify_layers) == 0:


if modify_layers = "", we do not look into children of the module, and directly modify module instead of layers.

This is useful when we want to directly replace the full MLP module

(nitpicking) I would check if == "" instead of len == 0

or maybe checking if modify_layers == None ?

pclucas14 · 2024-08-19T13:36:11Z

mttl/models/containers/__init__.py

-                        expert,
-                        action=action,
-                        is_default=is_default,
+            if len(expert_config.modify_layers) == 0:


(nitpicking) I would check if == "" instead of len == 0

pclucas14 · 2024-08-19T13:44:39Z

mttl/models/containers/__init__.py

-                        expert,
-                        action=action,
-                        is_default=is_default,
+            if len(expert_config.modify_layers) == 0:


or maybe checking if modify_layers == None ?

pclucas14 · 2024-08-19T13:45:37Z

mttl/models/containers/peer_container.py

+    """
+    PEER layer from Mixture of A Million Experts (https://arxiv.org/pdf/2407.04153)
+
+    Right not it assumes that it receives a module -- an MLP block, that has attributes fc1 and fc2.


typo (not / now)

pclucas14 · 2024-08-19T13:46:15Z

mttl/models/containers/peer_container.py

+from mttl.models.modifiers.modify_model import get_modifier_name
+
+# diff architectures name those layers differently
+down_names = ["fc1", "c_fc"]


can we make these arguments to be passed from the config ?

pclucas14 · 2024-08-19T13:49:29Z

mttl/models/expert_model.py

@@ -693,3 +680,22 @@ def training_step(self, batch, _):
        for i, pg in enumerate(self.optimizers().optimizer.param_groups):
            self.log(f"train/lr_{i}", pg["lr"])
        return total_loss
+
+    def as_expert(self):


how does as_expert work for multiple experts ?

ups, this should not be here. Addressed

sordonia · 2024-08-20T03:43:48Z

tests/test_peer.py

+from mttl.models.modifiers.lora import LoRA
+
+
+@pytest.fixture


isn't this fixture in another file already?

sordonia · 2024-08-20T03:51:21Z

mttl/models/containers/__init__.py

-                        expert,
-                        action=action,
-                        is_default=is_default,
+            if expert_config.modify_layers == "":


just a q: do we really need to keep both modify_layers and modify_module here? isn't there something we can do to say:

modify_layers=X,

where X is all the layers/modules that match but not their children,

e.g. if X=k_proj, we modify 1.block.k_proj, then we also modify 2.block.k_proj, cause they have different parents

but if X=block, then 1.block.k_proj, and 1.block.q_proj is modified only once (we keep track of whether we modified the parent of the current layer)

doesn't this encompass modify_layers and modify_modules?

will remove "modify_layers" argument in a separate PR

sordonia · 2024-08-20T03:51:45Z

mttl/models/containers/__init__.py

-                        expert,
-                        action=action,
-                        is_default=is_default,
+            if expert_config.modify_layers == "":


if not modify_layers?

pclucas14 · 2024-08-21T13:05:11Z

mttl/models/containers/peer_container.py

+from mttl.models.modifiers.modify_model import get_modifier_name
+
+# diff architectures name those layers differently
+DOWN_NAMES = ["fc1", "c_fc"]


Can we make this an argument that is part of the config ?

addressed now

oleksost added 5 commits August 14, 2024 19:27

wip

741dfca

multihead pk selector

5a4b076

Merge branch 'moe_training_fixes' into 1MoE

56bf561

Merge branch 'moe_training_fixes' into 1MoE

fac416f

PEER MoE

bc08365

oleksost requested review from pclucas14 and sordonia August 16, 2024 19:06

formatting

9f3d0e2

sordonia reviewed Aug 16, 2024

View reviewed changes

oleksost added 4 commits August 19, 2024 09:19

moved functions

684c97d

isort

f029eff

black

dd76f72

revert isort

853b6d0

pclucas14 reviewed Aug 19, 2024

View reviewed changes

oleksost added 7 commits August 19, 2024 10:25

added batch norm to pk routing

c33a1ab

correction: moved functions around

872e1a5

comments

ad85215

debug

2016b5a

tests

a143592

black

a07a581

isort

9248a67

oleksost requested review from pclucas14 and sordonia August 19, 2024 17:35

sordonia approved these changes Aug 20, 2024

View reviewed changes

oleksost changed the title ~~[Draft] PEER MoE~~ PEER MoE Aug 20, 2024

oleksost added 3 commits August 20, 2024 13:06

moved fixture

42275f2

nitpick

322df4d

test

4d40d45

sordonia approved these changes Aug 20, 2024

View reviewed changes

removed modif layers

3550d1b

oleksost requested a review from sordonia August 20, 2024 18:24

pclucas14 approved these changes Aug 21, 2024

View reviewed changes

oleksost added 9 commits August 21, 2024 13:50

test match modules to modify

b0a7e15

format

01b8138

isort

8f6d7c6

black

6241156

fc1 and fc2 names in the config

042e59b

modify_layers should not end with .*

e1c971b

black

72ada65

depreciated modify_layers

a351fba

tests

968b974

sordonia approved these changes Aug 23, 2024

View reviewed changes

matheper added 2 commits August 23, 2024 12:26

Merge branch 'main' into 1MoE

d756e54

nit test_peer

12ec534

oleksost merged commit 0be1a67 into main Aug 23, 2024
6 checks passed

oleksost deleted the 1MoE branch August 23, 2024 20:02

oleksost mentioned this pull request Aug 28, 2024

Can we remove "modify_layers" argument? "Modify_modules" can be sufficient? #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PEER MoE #93

PEER MoE #93

oleksost commented Aug 16, 2024 •

edited

Loading

sordonia Aug 16, 2024

oleksost Aug 19, 2024

sordonia Aug 16, 2024

oleksost Aug 19, 2024

pclucas14 Aug 19, 2024

pclucas14 Aug 19, 2024

pclucas14 Aug 19, 2024

pclucas14 Aug 19, 2024

pclucas14 Aug 19, 2024

oleksost Aug 19, 2024

pclucas14 Aug 19, 2024

oleksost Aug 19, 2024

pclucas14 Aug 19, 2024

oleksost Aug 19, 2024

sordonia Aug 20, 2024

sordonia Aug 20, 2024

oleksost Aug 20, 2024

oleksost Aug 20, 2024

sordonia Aug 20, 2024

pclucas14 Aug 21, 2024

oleksost Aug 21, 2024

PEER MoE #93

PEER MoE #93

Conversation

oleksost commented Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oleksost commented Aug 16, 2024 •

edited

Loading