benchmarl.experiment.ExperimentConfig

class ExperimentConfig(sampling_device: str, train_device: str, buffer_device: str, share_policy_params: bool, prefer_continuous_actions: bool, collect_with_grad: bool, parallel_collection: bool, gamma: float, lr: float, adam_eps: float, clip_grad_norm: bool, clip_grad_val: float | None, soft_target_update: bool, polyak_tau: float, hard_target_update_frequency: int, exploration_eps_init: float, exploration_eps_end: float, exploration_anneal_frames: int | None, max_n_iters: int | None, max_n_frames: int | None, on_policy_collected_frames_per_batch: int, on_policy_n_envs_per_worker: int, on_policy_n_minibatch_iters: int, on_policy_minibatch_size: int, off_policy_collected_frames_per_batch: int, off_policy_n_envs_per_worker: int, off_policy_n_optimizer_steps: int, off_policy_train_batch_size: int, off_policy_memory_size: int, off_policy_init_random_frames: int, off_policy_use_prioritized_replay_buffer: bool, off_policy_prb_alpha: float, off_policy_prb_beta: float, evaluation: bool, render: bool, evaluation_interval: int, evaluation_episodes: int, evaluation_deterministic_actions: bool, loggers: List[str], project_name: str, create_json: bool, save_folder: str | None, restore_file: str | None, restore_map_location: Any | None, checkpoint_interval: int, checkpoint_at_end: bool, keep_checkpoints_num: int | None)[source]

Bases: object

Configuration class for experiments. This class acts as a schema for loading and validating yaml configurations.

Parameters in this class aim to be agnostic of the algorithm, task or model used. To know their meaning, please check out the descriptions in benchmarl/conf/experiment/base_experiment.yaml

sampling_device: str = <dataclasses._MISSING_TYPE object>
train_device: str = <dataclasses._MISSING_TYPE object>
buffer_device: str = <dataclasses._MISSING_TYPE object>
share_policy_params: bool = <dataclasses._MISSING_TYPE object>
prefer_continuous_actions: bool = <dataclasses._MISSING_TYPE object>
collect_with_grad: bool = <dataclasses._MISSING_TYPE object>
parallel_collection: bool = <dataclasses._MISSING_TYPE object>
gamma: float = <dataclasses._MISSING_TYPE object>
lr: float = <dataclasses._MISSING_TYPE object>
adam_eps: float = <dataclasses._MISSING_TYPE object>
clip_grad_norm: bool = <dataclasses._MISSING_TYPE object>
clip_grad_val: float | None = <dataclasses._MISSING_TYPE object>
soft_target_update: bool = <dataclasses._MISSING_TYPE object>
polyak_tau: float = <dataclasses._MISSING_TYPE object>
hard_target_update_frequency: int = <dataclasses._MISSING_TYPE object>
exploration_eps_init: float = <dataclasses._MISSING_TYPE object>
exploration_eps_end: float = <dataclasses._MISSING_TYPE object>
exploration_anneal_frames: int | None = <dataclasses._MISSING_TYPE object>
max_n_iters: int | None = <dataclasses._MISSING_TYPE object>
max_n_frames: int | None = <dataclasses._MISSING_TYPE object>
on_policy_collected_frames_per_batch: int = <dataclasses._MISSING_TYPE object>
on_policy_n_envs_per_worker: int = <dataclasses._MISSING_TYPE object>
on_policy_n_minibatch_iters: int = <dataclasses._MISSING_TYPE object>
on_policy_minibatch_size: int = <dataclasses._MISSING_TYPE object>
off_policy_collected_frames_per_batch: int = <dataclasses._MISSING_TYPE object>
off_policy_n_envs_per_worker: int = <dataclasses._MISSING_TYPE object>
off_policy_n_optimizer_steps: int = <dataclasses._MISSING_TYPE object>
off_policy_train_batch_size: int = <dataclasses._MISSING_TYPE object>
off_policy_memory_size: int = <dataclasses._MISSING_TYPE object>
off_policy_init_random_frames: int = <dataclasses._MISSING_TYPE object>
off_policy_use_prioritized_replay_buffer: bool = <dataclasses._MISSING_TYPE object>
off_policy_prb_alpha: float = <dataclasses._MISSING_TYPE object>
off_policy_prb_beta: float = <dataclasses._MISSING_TYPE object>
evaluation: bool = <dataclasses._MISSING_TYPE object>
render: bool = <dataclasses._MISSING_TYPE object>
evaluation_interval: int = <dataclasses._MISSING_TYPE object>
evaluation_episodes: int = <dataclasses._MISSING_TYPE object>
evaluation_deterministic_actions: bool = <dataclasses._MISSING_TYPE object>
loggers: List[str] = <dataclasses._MISSING_TYPE object>
project_name: str = <dataclasses._MISSING_TYPE object>
create_json: bool = <dataclasses._MISSING_TYPE object>
save_folder: str | None = <dataclasses._MISSING_TYPE object>
restore_file: str | None = <dataclasses._MISSING_TYPE object>
restore_map_location: Any | None = <dataclasses._MISSING_TYPE object>
checkpoint_interval: int = <dataclasses._MISSING_TYPE object>
checkpoint_at_end: bool = <dataclasses._MISSING_TYPE object>
keep_checkpoints_num: int | None = <dataclasses._MISSING_TYPE object>
train_batch_size(on_policy: bool) int[source]

The batch size of tensors used for training

Parameters:

on_policy (bool) – is the algorithms on_policy

train_minibatch_size(on_policy: bool) int[source]

The minibatch size of tensors used for training. On-policy algorithms are trained by splitting the train_batch_size (equal to the collected frames) into minibatches. Off-policy algorithms do not go through this process and thus have the train_minibatch_size==train_batch_size

Parameters:

on_policy (bool) – is the algorithms on_policy

n_optimizer_steps(on_policy: bool) int[source]

Number of times to loop over the training step per collection iteration.

Parameters:

on_policy (bool) – is the algorithms on_policy

replay_buffer_memory_size(on_policy: bool) int[source]

Size of the replay buffer memory in terms of frames

Parameters:

on_policy (bool) – is the algorithms on_policy

collected_frames_per_batch(on_policy: bool) int[source]

Number of collected frames per collection iteration.

Args:

on_policy (bool): is the algorithms on_policy

n_envs_per_worker(on_policy: bool) int[source]

Number of environments used for collection

  • In vectorized environments, this will be the vectorized batch_size.

  • In other environments, this will be emulated by running them sequentially.

Parameters:

on_policy (bool) – is the algorithms on_policy

get_max_n_frames(on_policy: bool) int[source]

Get the maximum number of frames collected before the experiment ends.

Parameters:

on_policy (bool) – is the algorithms on_policy

get_max_n_iters(on_policy: bool) int[source]

Get the maximum number of experiment iterations before the experiment ends.

Parameters:

on_policy (bool) – is the algorithms on_policy

get_exploration_anneal_frames(on_policy: bool)[source]

Get the number of frames for exploration annealing. If self.exploration_anneal_frames is None this will be a third of the total frames to collect.

Parameters:

on_policy (bool) – is the algorithms on_policy

static get_from_yaml(path: str | None = None)[source]

Load the experiment configuration from yaml

Parameters:

path (str, optional) – The full path of the yaml file to load from. If None, it will default to benchmarl/conf/experiment/base_experiment.yaml

Returns:

the loaded ExperimentConfig

validate(on_policy: bool)[source]

Validates config.

Parameters:

on_policy (bool) – is the algorithms on_policy