Components

The goal of BenchMARL is to bring different MARL environments and algorithms under the same interfaces to enable fair and reproducible comparison and benchmarking. BenchMARL is a full-pipline unified training library with the goal of enabling users to run any comparison they want across our algorithms and tasks in just one line of code. To achieve this, BenchMARL interconnects components from TorchRL, which provides an efficient and reliable backend.

The library has a default configuration for each of its components. While parts of this configuration are supposed to be changed (for example experiment configurations), other parts (such as tasks) should not be changed to allow for reproducibility. To aid in this, each version of BenchMARL is paired to a default configuration.

Let’s now introduce each component in the library.

Experiment

An Experiment is a training run in which an Algorithm, Task, and a Model are fixed. Experiments are configured by passing these values alongside a seed and the experiment hyperparameters. The experiment hyperparameters cover both on-policy and off-policy algorithms, discrete and continuous actions, and probabilistic and deterministic policies (as they are agnostic of the algorithm or task used). An experiment can be launched from the command line or from a script. See the Running section for more information.

Benchmark

In the library we call benchmark a collection of experiments that can vary in tasks, algorithm, or model. A benchmark shares the same experiment configuration across all of its experiments. Benchmarks allow to compare different MARL components in a standardized way. A benchmark can be launched from the command line or from a script. See the [run](#run) section for more information.

Algorithms

Algorithms are an ensemble of components (e.g., loss, replay buffer) which determine the training strategy. Here is a table with the currently implemented algorithms in BenchMARL.

Algorithms in BenchMARL
Algorithm	On/Off policy	Actor-critic	Full-observability in critic	Action compatibility	Probabilistic actor
`Mappo`	On	Yes	Yes	Continuous + Discrete	Yes
`Ippo`	On	Yes	No	Continuous + Discrete	Yes
`Maddpg`	Off	Yes	Yes	Continuous	No
`Iddpg`	Off	Yes	No	Continuous	No
`Masac`	Off	Yes	Yes	Continuous + Discrete	Yes
`Isac`	Off	Yes	No	Continuous + Discrete	Yes
`Qmix`	Off	No	NA	Discrete	No
`Vdn`	Off	No	NA	Discrete	No
`Iql`	Off	No	NA	Discrete	No

Environments

Tasks are scenarios from a specific environment which constitute the MARL challenge to solve. They differ based on many aspects, here is a table with the current environments in BenchMARL:

Environments in BenchMARL
Environment	Tasks	Cooperation	Global state	Reward function	Action space	Vectorized
`VmasTask`	18	Cooperative + Competitive	No	Shared + Independent + Global	Continuous + Discrete	Yes
`Smacv2Task`	15	Cooperative	Yes	Global	Discrete	No
`PettingZooTask`	10	Cooperative + Competitive	Yes + No	Shared + Independent	Continuous + Discrete	No
`MeltingPotTask`	49	Cooperative + Competitive	Yes	Independent	Discrete	No

Models

Models are neural networks used to process data. They can be used as actors (policies) or, when requested, as critics. We provide a set of base models (layers) and a SequenceModel to concatenate different layers. All the models can be used with or without parameter sharing within an agent group. Here is a table of the models implemented in BenchMARL

Models in BenchMARL
Name	Decentralized	Centralized with local inputs	Centralized with global input
`Mlp`	Yes	Yes	Yes
`Gnn`	Yes	No	No
`Cnn`	Yes	Yes	Yes