Features
BenchMARL has several features:
A test CI with integration and training test routines that are run for all simulators and algorithms
Integration in the official TorchRL ecosystem for dedicated support
Logging
BenchMARL is compatible with the TorchRL loggers.
A list of logger names can be provided in the experiment config.
Example of available options are: wandb
, csv
, mflow
, tensorboard
or any other option available in TorchRL.
You can specify the loggers
in the yaml config files or in the script arguments like so:
python benchmarl/run.py algorithm=mappo task=vmas/balance "experiment.loggers=[wandb]"
The wandb logger is fully compatible with experiment restoring and will automatically resume the run of the loaded experiment.
Checkpointing
Experiments can be checkpointed every experiment.checkpoint_interval
collected frames.
Experiments will use an output folder for logging and checkpointing which can be specified in experiment.save_folder
.
If this is left unspecified,
the default will be the hydra output folder (if using hydra) or (otherwise) the current directory
where the script is launched.
The output folder will contain a folder for each experiment with the corresponding experiment name.
Their checkpoints will be stored in a "checkpoints"
folder within the experiment folder.
python benchmarl/run.py task=vmas/balance algorithm=mappo experiment.max_n_iters=3 experiment.on_policy_collected_frames_per_batch=100 experiment.checkpoint_interval=100
Reloading
To load from a checkpoint, you can do it in multiple ways:
You can pass the absolute checkpoint file name to experiment.restore_file
.
This allows you to change some parts of the config (e.g., task parameters to evaluate in a new setting).
python benchmarl/run.py task=vmas/balance algorithm=mappo experiment.max_n_iters=6 experiment.on_policy_collected_frames_per_batch=100 experiment.restore_file="/hydra/experiment/folder/checkpoint/checkpoint_300.pt"
If you do not need to change the config, you can also just resume from the checkpoint file with:
python benchmarl/resume.py ../outputs/2024-09-09/20-39-31/mappo_balance_mlp__cd977b69_24_09_09-20_39_31/checkpoints/checkpoint_100.pt
In Python, this is equivalent to:
from benchmarl.hydra_config import reload_experiment_from_file
experiment = reload_experiment_from_file(checkpoint_file)
experiment.run()
Evaluating
Evaluation is automatically run throughout training and can be configured from ExperimentConfig
.
By default, evaluation will be run in different domain randomised environments throughout training.
If you want to always evaluate in the same exact (seeded) environments, set benchmarl.experiment.ExperimentConfig.evaluation_static
.
To evaluate a saved experiment, you can:
from benchmarl.hydra_config import reload_experiment_from_file
experiment = reload_experiment_from_file(checkpoint_file)
experiment.evaluate()
This will run an iteration of evaluation, logging it to the experiment loggers (and to json if benchmarl.experiment.ExperimentConfig.create_json
=True
).
There is a command line script which automates this:
python benchmarl/evaluate.py ../outputs/2024-09-09/20-39-31/mappo_balance_mlp__cd977b69_24_09_09-20_39_31/checkpoints/checkpoint_100.pt
Rendering
Rendering is performed by default during evaluation (benchmarl.experiment.ExperimentConfig.render
= True
).
If multiple evaluation episodes are requested (benchmarl.experiment.ExperimentConfig.evaluation_episodes
>1
), then only the first one will be rendered.
Renderings will be made available in the loggers you chose (benchmarl.experiment.ExperimentConfig.loggers
):
In Wandb, renderings are reported under
eval/video
In CSV, renderings are saved in the experiment folder under
video
Devices
It is possible to choose different devices for simulation, training, and buffer storage (in the off-policy case).
These devices can be any torch.device
and are set via benchmarl.experiment.ExperimentConfig.sampling_device
,
benchmarl.experiment.ExperimentConfig.train_device
, benchmarl.experiment.ExperimentConfig.buffer_device
.
buffer_device
can also be set to "disk"
to store buffers on disk.
Note that for vectorized simulators such as VMAS, choosing
sampling_device
="cuda"
and train_device
="cuda"
will give important speed-ups as both simulation and training will be run in a batch on the GPU, with no data being moved around.
Callbacks
Experiments optionally take a list of Callback
which have several methods
that you can implement to see what’s going on during training, such
as:
Ensemble models and algorithms
It is possible to use different algorithms and models for different agent groups.
Ensemble algorithm
Ensemble algorithms take as input a dictionary mapping group names to algorithm configs:
from benchmarl.algorithms import EnsembleAlgorithmConfig, IsacConfig, MaddpgConfig
algorithm_config = EnsembleAlgorithmConfig(
{"agent": MaddpgConfig.get_from_yaml(), "adversary": IsacConfig.get_from_yaml()}
)
Note
All algorithms need to be on-policy or off-policy, it is not possible to mix the two paradigms.
Ensemble model
Ensemble models take as input a dictionary mapping group names to model configs:
from benchmarl.models import EnsembleModelConfig, GnnConfig, MlpConfig
model_config = EnsembleModelConfig(
{"agent": MlpConfig.get_from_yaml(), "adversary": GnnConfig.get_from_yaml()}
)
Note
If you use ensemble models with sequence models, make sure the ensemble is the outer layer (you cannot make a sequence of ensembles, but an ensemble of sequences yes).