# Multi-Agent First Order Constrained Optimization in Policy Space

The repository is for the paper: **[Multi-Agent First Order Constrained Optimization in Policy Space]**, in which we investigate the problem of safe MARL. For multi-agent systems, the ability to achieve high performance while avoiding unsafe actions is becoming an urgent and imperative problem to solve for real-life applications, but there are few solutions for this problem.  Taking inspiration from the sequential policy update scheme introduced in HATRPO:[https://arxiv.org/abs/2109.11251] and the multi-agent trust region learning theory of MACPO:[https://www.sciencedirect.com/science/article/abs/pii/S0004370223000516], which is the SOTA algorithm in safe MARL, we have devised a new approach to incorporate safety constraints in solving safe MARL problems. The resulting algorithm, Multi-Agent First Order Constrained Optimization in Policy Space (MAFOCOPS), aims to address the following question of how to achieve the best constraint-satisfying policy update given current policy for each agent. The experimental results demonstrate the superior performance and higher efficiency of MAFOCOPS compared to MACPO.


***
## Environments Supported:

- Safe Multi-Agent Mujoco

- Safe Multi-Agent Isaac Gym

  The experiments are conducted in the two benchmarks proposed by MACPO. The specific configurations can be find in the README.md of each environment.
  
  

## Results

All the experiment results are presented in our paper. We also include some videos for safe multi-agent mujoco in our supplementary materials for intuitive understanding of the performance of the algorithms.

## Acknowledgments

Our implementation is based on some open source repositories and we show our gratitude: [Safe MA-MuJoCo](https://github.com/chauncygu/Multi-Agent-Constrained-Policy-Optimisation), [Safe MAIG](https://github.com/chauncygu/Safe-Multi-Agent-Isaac-Gym/tree/main).



