53 lines
3.2 KiB
Plaintext
53 lines
3.2 KiB
Plaintext
---
|
|
title: "Aquarium MARL Environment"
|
|
tags: [MARL, simulation, emergence, complex-systems, environment, predator-prey, reinforcement-learning, multi-agent]
|
|
excerpt: "Aquarium: Open-source MARL environment for predator-prey studies."
|
|
teaser: /figures/20_aquarium.png
|
|
venue: "ICAART 2024"
|
|
---
|
|
|
|
<FloatingImage
|
|
src="/figures/20_aquarium.png"
|
|
alt="The Multi-Agent Reinforcement Learning Cycle. Plot showing how Agent receive individual rewards from the environment."
|
|
width={450}
|
|
height={350}
|
|
float="right"
|
|
caption="The Multi-Agent Reinforcement Learning Cycle: Agents receive individual rewards from the environment."
|
|
/>
|
|
|
|
The study of complex interactions using Multi-Agent Reinforcement Learning (MARL), particularly **predator-prey dynamics**, often requires specialized simulation environments.
|
|
To streamline research and avoid redundant development efforts, we introduce **Aquarium**: a versatile, open-source MARL environment specifically designed for investigating predator-prey scenarios and related **emergent behaviors**.
|
|
|
|
Key Features of Aquarium:
|
|
|
|
* **Framework Integration:** Built upon and seamlessly integrates with the popular **PettingZoo API**, allowing researchers to readily apply existing MARL algorithm implementations (e.g., from Stable-Baselines3, RLlib).
|
|
|
|
* **Physics-Based Movement:** Simulates agent movement on a two-dimensional, continuous plane with edge-wrapping boundaries, incorporating basic physics for more realistic interactions.
|
|
|
|
* **High Customizability:** Offers extensive configuration options for:
|
|
* **Agent-Environment Interactions:** Observation spaces, action spaces, and reward functions can be tailored to specific research questions.
|
|
* **Environmental Parameters:** Key dynamics like agent speeds, prey reproduction rates, predator starvation mechanisms, sensor ranges, and more are fully adjustable.
|
|
|
|
* **Visualization & Recording:** Includes a resource-efficient visualizer and supports video recording of simulation runs, facilitating qualitative analysis and understanding of agent behaviors.
|
|
|
|
<CenteredImage
|
|
src="/figures/20_observation_vector.png"
|
|
alt="Diagram detailing the construction of the observation vector for an agent"
|
|
width={450}
|
|
height={350}
|
|
maxWidth="75%"
|
|
caption="Construction details of the agent observation vector, illustrating the information available to each agent."
|
|
/>
|
|
|
|
To demonstrate its capabilities, we conducted preliminary studies using **Proximal Policy Optimization (PPO)** to train multiple prey agents learning to evade a predator within Aquarium.
|
|
|
|
<CenteredImage
|
|
src="/figures/20_capture_statistics.png"
|
|
alt="Graphs showing average captures or rewards per prey agent under different training regimes"
|
|
width={450}
|
|
height={350}
|
|
maxWidth="75%"
|
|
caption="Performance metrics (e.g., average captures/rewards) comparing different training strategies in Aquarium."
|
|
/>
|
|
|
|
Consistent with findings in existing MARL literature, our results showed that training agents with **individual policies led to suboptimal performance**, whereas utilizing **parameter sharing** among prey agents significantly improved coordination, sample efficiency, and overall evasion success. <Cite bibtexKey="kolle2024aquarium" /> |