feature(whl): add AWR algorithm. #828

kxzxvbk · 2024-09-10T13:51:32Z

Description

Implement the algorithm of AWR (languange model as the policy)

Related Issue

TODO

Check List

merge the latest version source branch/repo, and resolve all the conflicts
pass style check
pass all the tests

ding/model/template/language_transformer.py

ding/policy/prompt_awr.py

PaParaZz1 · 2024-09-18T08:10:04Z

add the AWR algorithm into the table of README

ding/model/template/language_transformer.py

puyuan1996 · 2024-09-19T08:04:31Z

ding/model/template/language_transformer.py

@@ -18,13 +18,16 @@ class LanguageTransformer(nn.Module):
    Interfaces:
        ``__init__``, ``forward``
    """
+    mode = ['compute_actor', 'compute_critic', 'compute_actor_critic']

    def __init__(
            self,
            model_name: str = "bert-base-uncased",


add some comments about why we use "bert-base-uncased" as default?

puyuan1996 · 2024-09-19T08:07:10Z

ding/policy/prompt_awr.py

+
+            # Prepare train_sample (the question to be answered) and the candidate_samples (the prompts to be selected)
+            train_samples, cand_samples = batch["obs"]["train_sample"], batch["obs"]["candidate_samples"]
+            for ii in range(len(cand_samples)):


change 'ii' into more explainable name?

puyuan1996 · 2024-09-19T08:08:46Z

ding/policy/prompt_awr.py

+                adv = torch.clamp(
+                    return_ - batch['value'], min=self._cfg.learn.norm_range[0], max=self._cfg.learn.norm_range[1]
+                )
+                policy_loss = -(log_prob * torch.exp(adv / self._cfg.learn.beta)).mean()


add comments about the key part of advantage weighted regression

Should the operation torch.exp(adv / self._cfg.learn.beta) stop the gradient flow?

this operation is computed on the batch data rather than the output data, thus it is no need to stop the gradient

puyuan1996 · 2024-09-19T08:11:36Z

ding/policy/prompt_awr.py

+            if len(real_act.shape) == 1:
+                real_act = real_act.unsqueeze(-1)
+            # Calculate loss.
+            total_policy_loss, total_entropy_loss, total_value_loss = 0, 0, 0


update comments

puyuan1996 · 2024-09-19T08:18:16Z

ding/policy/prompt_awr.py

+            # (float) Coefficient that controls the exp scale in awr algorithm.
+            beta=1.0,
+            # (float) Weight of entropy regularization in the loss function.
+            entropy_weight=0.01,


Should we change it to a more generally applicable constant, such as 0.001?

PaParaZz1 · 2024-09-20T07:33:46Z

ding/policy/prompt_awr.py

+                adv = torch.clamp(
+                    return_ - batch['value'], min=self._cfg.learn.norm_range[0], max=self._cfg.learn.norm_range[1]
+                )
+                policy_loss = -(log_prob * torch.exp(adv / self._cfg.learn.beta)).mean()


this operation is computed on the batch data rather than the output data, thus it is no need to stop the gradient

PaParaZz1 · 2024-09-20T07:34:21Z

dizoo/tabmwp/config/tabmwp_awr_config.py

+from easydict import EasyDict
+
+tabmwp_prompt_pg_config = dict(
+    exp_name='tabmwp_prompt_pg_seed0',


polish name, not pg in this file

PaParaZz1 · 2024-09-20T07:35:08Z

ding/model/template/language_transformer.py

@@ -36,10 +39,16 @@ def __init__(
            - embedding_size (:obj:`int`): The embedding size of the added linear layer, such as 128.
            - freeze_encoder (:obj:`bool`): Whether to freeze the encoder language model while training, \
            defaults to be ``True``.
+            - hidden_dim (:obj:`int`): The embedding dimension of the encoding model (e.g. BERT). This value should \
+            correspond to the model you use. For bert-base-uncased, this value is 768.


there should be an indent

ding/model/template/language_transformer.py

‘whl’ added 2 commits September 10, 2024 21:48

init commit

f529857

reformat

8a47348

PaParaZz1 added the algo Add new algorithm or improve old one label Sep 11, 2024

PaParaZz1 changed the title ~~feature(whl): Add AWR algorithm.~~ feature(whl): add AWR algorithm. Sep 11, 2024

PaParaZz1 requested changes Sep 18, 2024

View reviewed changes

‘whl’ added 3 commits September 19, 2024 10:13

polish

10f6626

polish readme

d5292bf

reformat

7c3913d

puyuan1996 reviewed Sep 19, 2024

View reviewed changes

PaParaZz1 mentioned this pull request Sep 20, 2024

Roadmap for DI-engine #548

Open

PaParaZz1 approved these changes Sep 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature(whl): add AWR algorithm. #828

feature(whl): add AWR algorithm. #828

kxzxvbk commented Sep 10, 2024 •

edited

Loading

PaParaZz1 commented Sep 18, 2024

puyuan1996 Sep 19, 2024

puyuan1996 Sep 19, 2024

puyuan1996 Sep 19, 2024

puyuan1996 Sep 19, 2024

PaParaZz1 Sep 20, 2024

puyuan1996 Sep 19, 2024

puyuan1996 Sep 19, 2024

PaParaZz1 Sep 20, 2024

PaParaZz1 Sep 20, 2024

PaParaZz1 Sep 20, 2024

feature(whl): add AWR algorithm. #828

Are you sure you want to change the base?

feature(whl): add AWR algorithm. #828

Conversation

kxzxvbk commented Sep 10, 2024 • edited Loading

Description

Related Issue

TODO

Check List

PaParaZz1 commented Sep 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kxzxvbk commented Sep 10, 2024 •

edited

Loading