benchmarl.algorithms.Vdn
- class Vdn(delay_value: bool, loss_function: str, **kwargs)[source]
Bases:
AlgorithmVdn (from https://arxiv.org/abs/1706.05296).
- Parameters:
- _get_loss(group: str, policy_for_loss: TensorDictModule, continuous: bool) Tuple[LossModule, bool][source]
Implement this function to return the LossModule for a specific group.
- Parameters:
Returns: LossModule and a bool representing if the loss should have target parameters
- _get_parameters(group: str, loss: LossModule) Dict[str, Iterable][source]
Get the dictionary mapping loss names to the relative parameters to optimize for a given group loss.
Returns: a dictionary mapping loss names to a parameters’ list
- _get_policy_for_loss(group: str, model_config: ModelConfig, continuous: bool) TensorDictModule[source]
Get the non-explorative policy for a specific group.
- Parameters:
group (str) – agent group of the policy
model_config (ModelConfig) – model config class
continuous (bool) – whether the policy should be continuous or discrete
Returns: TensorDictModule representing the policy
- _get_policy_for_collection(policy_for_loss: TensorDictModule, group: str, continuous: bool) TensorDictModule[source]
Implement this function to add an explorative layer to the policy used in the loss.
- Parameters:
Returns: TensorDictModule representing the explorative policy
- process_batch(group: str, batch: TensorDictBase) TensorDictBase[source]
This function can be used to reshape data coming from collection before it is passed to the policy.
- Parameters:
group (str) – agent group
batch (TensorDictBase) – the batch of data coming from the collector
Returns: the processed batch
- get_mixer(group: str) TensorDictModule[source]
- _abc_impl = <_abc._abc_data object>