You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am wondering if the oscillation of the training phase comes from the fact that you only include down-sampling layers in your actor nets, since in partially observable domains, the information state of agents should be the whole line of history instead of the current one-shot observation. Thus, to include a recurrent module like LSTM or GRU might be helpful.
The text was updated successfully, but these errors were encountered:
Hi, I am wondering if the oscillation of the training phase comes from the fact that you only include down-sampling layers in your actor nets, since in partially observable domains, the information state of agents should be the whole line of history instead of the current one-shot observation. Thus, to include a recurrent module like LSTM or GRU might be helpful.
The text was updated successfully, but these errors were encountered: