benchmarl.experiment.ExperimentConfig
- class ExperimentConfig(sampling_device: str, train_device: str, buffer_device: str, share_policy_params: bool, prefer_continuous_actions: bool, collect_with_grad: bool, parallel_collection: bool, gamma: float, lr: float, adam_eps: float, clip_grad_norm: bool, clip_grad_val: float | None, soft_target_update: bool, polyak_tau: float, hard_target_update_frequency: int, exploration_eps_init: float, exploration_eps_end: float, exploration_anneal_frames: int | None, max_n_iters: int | None, max_n_frames: int | None, on_policy_collected_frames_per_batch: int, on_policy_n_envs_per_worker: int, on_policy_n_minibatch_iters: int, on_policy_minibatch_size: int, off_policy_collected_frames_per_batch: int, off_policy_n_envs_per_worker: int, off_policy_n_optimizer_steps: int, off_policy_train_batch_size: int, off_policy_memory_size: int, off_policy_init_random_frames: int, off_policy_use_prioritized_replay_buffer: bool, off_policy_prb_alpha: float, off_policy_prb_beta: float, evaluation: bool, render: bool, evaluation_interval: int, evaluation_episodes: int, evaluation_deterministic_actions: bool, loggers: List[str], project_name: str, create_json: bool, save_folder: str | None, restore_file: str | None, restore_map_location: Any | None, checkpoint_interval: int, checkpoint_at_end: bool, keep_checkpoints_num: int | None)[source]
Bases:
object
Configuration class for experiments. This class acts as a schema for loading and validating yaml configurations.
Parameters in this class aim to be agnostic of the algorithm, task or model used. To know their meaning, please check out the descriptions in
benchmarl/conf/experiment/base_experiment.yaml
- train_batch_size(on_policy: bool) int [source]
The batch size of tensors used for training
- Parameters:
on_policy (bool) – is the algorithms on_policy
- train_minibatch_size(on_policy: bool) int [source]
The minibatch size of tensors used for training. On-policy algorithms are trained by splitting the train_batch_size (equal to the collected frames) into minibatches. Off-policy algorithms do not go through this process and thus have the
train_minibatch_size==train_batch_size
- Parameters:
on_policy (bool) – is the algorithms on_policy
- n_optimizer_steps(on_policy: bool) int [source]
Number of times to loop over the training step per collection iteration.
- Parameters:
on_policy (bool) – is the algorithms on_policy
- replay_buffer_memory_size(on_policy: bool) int [source]
Size of the replay buffer memory in terms of frames
- Parameters:
on_policy (bool) – is the algorithms on_policy
- collected_frames_per_batch(on_policy: bool) int [source]
Number of collected frames per collection iteration.
- Args:
on_policy (bool): is the algorithms on_policy
- n_envs_per_worker(on_policy: bool) int [source]
Number of environments used for collection
In vectorized environments, this will be the vectorized batch_size.
In other environments, this will be emulated by running them sequentially.
- Parameters:
on_policy (bool) – is the algorithms on_policy
- get_max_n_frames(on_policy: bool) int [source]
Get the maximum number of frames collected before the experiment ends.
- Parameters:
on_policy (bool) – is the algorithms on_policy
- get_max_n_iters(on_policy: bool) int [source]
Get the maximum number of experiment iterations before the experiment ends.
- Parameters:
on_policy (bool) – is the algorithms on_policy
- get_exploration_anneal_frames(on_policy: bool)[source]
Get the number of frames for exploration annealing. If self.exploration_anneal_frames is None this will be a third of the total frames to collect.
- Parameters:
on_policy (bool) – is the algorithms on_policy