benchmarl.algorithms.Vdn

class Vdn(delay_value: bool, loss_function: str, **kwargs)[source]

Parameters:

loss_function (str) – loss function for the value discrepancy. Can be one of “l1”, “l2” or “smooth_l1”.
delay_value (bool) – whether to separate the target value networks from the value networks used for data collection.

_get_loss(group: str, policy_for_loss: TensorDictModule, continuous: bool) → Tuple[LossModule, bool][source]

Implement this function to return the LossModule for a specific group.

Parameters:

group (str) – agent group of the loss
policy_for_loss (TensorDictModule) – the policy to use in the loss
continuous (bool) – whether to return a loss for continuous or discrete actions

Returns: LossModule and a bool representing if the loss should have target parameters

_get_parameters(group: str, loss: LossModule) → Dict[str, Iterable][source]

Get the dictionary mapping loss names to the relative parameters to optimize for a given group loss.

Returns: a dictionary mapping loss names to a parameters’ list

_get_policy_for_loss(group: str, model_config: ModelConfig, continuous: bool) → TensorDictModule[source]

Get the non-explorative policy for a specific group.

Parameters:

Returns: TensorDictModule representing the policy

_get_policy_for_collection(policy_for_loss: TensorDictModule, group: str, continuous: bool) → TensorDictModule[source]

Implement this function to add an explorative layer to the policy used in the loss.

Parameters:

Returns: TensorDictModule representing the explorative policy

This function can be used to reshape data coming from collection before it is passed to the policy.

Parameters:

Returns: the processed batch