Features

BenchMARL has several features:

  • A test CI with integration and training test routines that are run for all simulators and algorithms

  • Integration in the official TorchRL ecosystem for dedicated support

Logging

BenchMARL is compatible with the TorchRL loggers. A list of logger names can be provided in the experiment config. Example of available options are: wandb, csv, mflow, tensorboard or any other option available in TorchRL. You can specify the loggers in the yaml config files or in the script arguments like so:

python benchmarl/run.py algorithm=mappo task=vmas/balance "experiment.loggers=[wandb]"

The wandb logger is fully compatible with experiment restoring and will automatically resume the run of the loaded experiment.

Checkpointing

Experiments can be checkpointed every experiment.checkpoint_interval collected frames. Experiments will use an output folder for logging and checkpointing which can be specified in experiment.save_folder. If this is left unspecified, the default will be the hydra output folder (if using hydra) or (otherwise) the current directory where the script is launched. The output folder will contain a folder for each experiment with the corresponding experiment name. Their checkpoints will be stored in a "checkpoints" folder within the experiment folder.

python benchmarl/run.py task=vmas/balance algorithm=mappo experiment.max_n_iters=3 experiment.on_policy_collected_frames_per_batch=100 experiment.checkpoint_interval=100

Example

Reloading

To load from a checkpoint, you can do it in multiple ways:

You can pass the absolute checkpoint file name to experiment.restore_file. This allows you to change some parts of the config (e.g., task parameters to evaluate in a new setting).

python benchmarl/run.py task=vmas/balance algorithm=mappo experiment.max_n_iters=6 experiment.on_policy_collected_frames_per_batch=100 experiment.restore_file="/hydra/experiment/folder/checkpoint/checkpoint_300.pt"

Example

If you do not need to change the config, you can also just resume from the checkpoint file with:

python benchmarl/resume.py ../outputs/2024-09-09/20-39-31/mappo_balance_mlp__cd977b69_24_09_09-20_39_31/checkpoints/checkpoint_100.pt

In Python, this is equivalent to:

from benchmarl.hydra_config import reload_experiment_from_file
experiment = reload_experiment_from_file(checkpoint_file)
experiment.run()

Evaluating

Evaluation is automatically run throughout training and can be configured from ExperimentConfig. By default, evaluation will be run in different domain randomised environments throughout training. If you want to always evaluate in the same exact (seeded) environments, set benchmarl.experiment.ExperimentConfig.evaluation_static.

To evaluate a saved experiment, you can:

from benchmarl.hydra_config import reload_experiment_from_file
experiment = reload_experiment_from_file(checkpoint_file)
experiment.evaluate()

This will run an iteration of evaluation, logging it to the experiment loggers (and to json if benchmarl.experiment.ExperimentConfig.create_json =True).

There is a command line script which automates this:

python benchmarl/evaluate.py ../outputs/2024-09-09/20-39-31/mappo_balance_mlp__cd977b69_24_09_09-20_39_31/checkpoints/checkpoint_100.pt

Rendering

Rendering is performed by default during evaluation (benchmarl.experiment.ExperimentConfig.render = True). If multiple evaluation episodes are requested (benchmarl.experiment.ExperimentConfig.evaluation_episodes >1), then only the first one will be rendered.

Renderings will be made available in the loggers you chose (benchmarl.experiment.ExperimentConfig.loggers):

  • In Wandb, renderings are reported under eval/video

  • In CSV, renderings are saved in the experiment folder under video

Devices

It is possible to choose different devices for simulation, training, and buffer storage (in the off-policy case).

These devices can be any torch.device and are set via benchmarl.experiment.ExperimentConfig.sampling_device, benchmarl.experiment.ExperimentConfig.train_device, benchmarl.experiment.ExperimentConfig.buffer_device.

buffer_device can also be set to "disk" to store buffers on disk.

Note that for vectorized simulators such as VMAS, choosing sampling_device ="cuda" and train_device ="cuda" will give important speed-ups as both simulation and training will be run in a batch on the GPU, with no data being moved around.

Callbacks

Experiments optionally take a list of Callback which have several methods that you can implement to see what’s going on during training, such as:

Example

Ensemble models and algorithms

It is possible to use different algorithms and models for different agent groups.

Ensemble algorithm

Ensemble algorithms take as input a dictionary mapping group names to algorithm configs:

from benchmarl.algorithms import EnsembleAlgorithmConfig, IsacConfig, MaddpgConfig

algorithm_config = EnsembleAlgorithmConfig(
    {"agent": MaddpgConfig.get_from_yaml(), "adversary": IsacConfig.get_from_yaml()}
)

Note

All algorithms need to be on-policy or off-policy, it is not possible to mix the two paradigms.

Example

Ensemble model

Ensemble models take as input a dictionary mapping group names to model configs:

from benchmarl.models import EnsembleModelConfig, GnnConfig, MlpConfig

model_config = EnsembleModelConfig(
        {"agent": MlpConfig.get_from_yaml(), "adversary": GnnConfig.get_from_yaml()}
)

Note

If you use ensemble models with sequence models, make sure the ensemble is the outer layer (you cannot make a sequence of ensembles, but an ensemble of sequences yes).

Example