Components

The goal of BenchMARL is to bring different MARL environments and algorithms under the same interfaces to enable fair and reproducible comparison and benchmarking. BenchMARL is a full-pipline unified training library with the goal of enabling users to run any comparison they want across our algorithms and tasks in just one line of code. To achieve this, BenchMARL interconnects components from TorchRL, which provides an efficient and reliable backend.

The library has a default configuration for each of its components. While parts of this configuration are supposed to be changed (for example experiment configurations), other parts (such as tasks) should not be changed to allow for reproducibility. To aid in this, each version of BenchMARL is paired to a default configuration.

Let’s now introduce each component in the library.

Experiment

An Experiment is a training run in which an Algorithm, Task, and a Model are fixed. Experiments are configured by passing these values alongside a seed and the experiment hyperparameters. The experiment hyperparameters cover both on-policy and off-policy algorithms, discrete and continuous actions, and probabilistic and deterministic policies (as they are agnostic of the algorithm or task used). An experiment can be launched from the command line or from a script. See the Running section for more information.

Benchmark

In the library we call benchmark a collection of experiments that can vary in tasks, algorithm, or model. A benchmark shares the same experiment configuration across all of its experiments. Benchmarks allow to compare different MARL components in a standardized way. A benchmark can be launched from the command line or from a script. See the [run](#run) section for more information.

Algorithms

Algorithms are an ensemble of components (e.g., loss, replay buffer) which determine the training strategy. Here is a table with the currently implemented algorithms in BenchMARL.

Algorithms in BenchMARL

Algorithm

On/Off policy

Actor-critic

Full-observability in critic

Action compatibility

Probabilistic actor

Mappo

On

Yes

Yes

Continuous + Discrete

Yes

Ippo

On

Yes

No

Continuous + Discrete

Yes

Maddpg

Off

Yes

Yes

Continuous

No

Iddpg

Off

Yes

No

Continuous

No

Masac

Off

Yes

Yes

Continuous + Discrete

Yes

Isac

Off

Yes

No

Continuous + Discrete

Yes

Qmix

Off

No

NA

Discrete

No

Vdn

Off

No

NA

Discrete

No

Iql

Off

No

NA

Discrete

No

Environments

Tasks are scenarios from a specific environment which constitute the MARL challenge to solve. They differ based on many aspects, here is a table with the current environments in BenchMARL:

Environments in BenchMARL

Environment

Tasks

Cooperation

Global state

Reward function

Action space

Vectorized

VmasTask

18

Cooperative + Competitive

No

Shared + Independent + Global

Continuous + Discrete

Yes

Smacv2Task

15

Cooperative

Yes

Global

Discrete

No

PettingZooTask

10

Cooperative + Competitive

Yes + No

Shared + Independent

Continuous + Discrete

No

MeltingPotTask

49

Cooperative + Competitive

Yes

Independent

Discrete

No

Models

Models are neural networks used to process data. They can be used as actors (policies) or, when requested, as critics. We provide a set of base models (layers) and a SequenceModel to concatenate different layers. All the models can be used with or without parameter sharing within an agent group. Here is a table of the models implemented in BenchMARL

Models in BenchMARL

Name

Decentralized

Centralized with local inputs

Centralized with global input

Mlp

Yes

Yes

Yes

Gnn

Yes

No

No

Cnn

Yes

Yes

Yes