mirror of
https://github.com/illiumst/marl-factory-grid.git
synced 2025-07-08 02:21:36 +02:00
Updated pomdp_r comment + Added some additional comments + Restructured experiment calling + Added Readme and requirements.txt
This commit is contained in:
147
README.md
147
README.md
@ -1,133 +1,32 @@
|
||||
# About EDYS
|
||||
|
||||
## Tackling emergent dysfunctions (EDYs) in cooperation with Fraunhofer-IKS.
|
||||
|
||||
Collaborating with Fraunhofer-IKS, this project is dedicated to investigating Emergent Dysfunctions (EDYs) within
|
||||
multi-agent environments. In multi-agent reinforcement learning (MARL), a population of agents learns by interacting
|
||||
with each other in a shared environment and adapt their behavior based on the feedback they receive from the environment
|
||||
and the actions of other agents.
|
||||
|
||||
In this context, emergent behavior describes spontaneous behaviors resulting from interactions among agents and
|
||||
environmental stimuli, rather than explicit programming. This promotes natural, adaptable behavior, increases system
|
||||
unpredictability for dynamic learning , enables diverse strategies, and encourages collective intelligence for complex
|
||||
problem-solving. However, the complex dynamics of the environment also give rise to emerging dysfunctions—unexpected
|
||||
issues from agent interactions. This research aims to enhance our understanding of EDYs and their impact on multi-agent
|
||||
systems.
|
||||
|
||||
### Project Objectives:
|
||||
|
||||
- Create an environment that provokes emerging dysfunctions.
|
||||
|
||||
- This is achieved by creating a high level of background noise in the domain, where various entities perform
|
||||
diverse tasks,
|
||||
resulting in a deliberately chaotic dynamic.
|
||||
- The goal is to observe and analyze naturally occurring emergent dysfunctions within the complexity generated in
|
||||
this dynamic environment.
|
||||
|
||||
|
||||
- Observational Framework:
|
||||
|
||||
- The project introduces an environment that is designed to capture dysfunctions as they naturally occur.
|
||||
- The environment allows for continuous monitoring of agent behaviors, actions, and interactions.
|
||||
- Tracking emergent dysfunctions in real-time provides valuable data for analysis and understanding.
|
||||
|
||||
|
||||
- Compatibility
|
||||
- The Framework allows learning entities from different manufacturers and projects with varying representations
|
||||
of actions and observations to interact seamlessly within the environment.
|
||||
|
||||
|
||||
- Placeholders
|
||||
|
||||
- One can provide an agent with a placeholder observation that contains no information and offers no meaningful
|
||||
insights.
|
||||
- Later, when the environment expands and introduces additional entities available for observation, these new
|
||||
observations can be provided to the agent.
|
||||
- This allows for processes such as retraining on an already initialized policy and fine-tuning to enhance the
|
||||
agent's performance based on the enriched information.
|
||||
# Emergence in Multi-Agent Systems: A Safety Perspective
|
||||
|
||||
## Setup
|
||||
|
||||
Install this environment using `pip install marl-factory-grid`. For more information refer
|
||||
to ['installation'](docs/source/installation.rst).
|
||||
Refer to [quickstart](_quickstart) for specific scenarios.
|
||||
1. Set up a virtualenv with python 3.9 or higher. You can use pyvenv or conda for this.
|
||||
2. Run ```pip install -r requirements.txt``` to get requirements.
|
||||
3. In case there is no ```study_out/``` folder in the root directory, create one.
|
||||
|
||||
## Usage
|
||||
## Rerunning the Experiments
|
||||
|
||||
The majority of environment objects, including entities, rules, and assets, can be loaded automatically.
|
||||
Simply specify the requirements of your environment in a [
|
||||
*yaml*-config file](marl_factory_grid/environment/configs/default_config.yaml).
|
||||
The respective experiments from our paper can be reenacted in ```main.py```.
|
||||
Just select the function representing the part of our experiments you want to rerun and
|
||||
execute it via the ```__main__``` function.
|
||||
|
||||
If you only plan on using the environment without making any modifications, use ``quickstart_use``.
|
||||
This creates a default config-file and another one that lists all possible options of the environment.
|
||||
Also, it generates an initial script where an agent is executed in the specified environment.
|
||||
For further details on utilizing the environment, refer to ['usage'](docs/source/usage.rst).
|
||||
## Further Remarks
|
||||
1. We use config files located in the ```marl_factory_grid/environment/configs``` and the
|
||||
```marl_factory_grid/algorithms/rl``` folders to configure the environments and the RL
|
||||
algorithm for our experiments, respectively. You don't need to change anything to rerun the
|
||||
experiments, but we provided some additional comments in the configs for an overall better
|
||||
understanding of the functionalities.
|
||||
2. Instead of collecting coins in the coin-quadrant environment our original implementation
|
||||
works with the premise of cleaning piles of dirt, thus it is named ```dirt_quadrant``` in the code instead.
|
||||
Note that this difference is only visual and does not change the underlying semantics of the environment.
|
||||
3. The code for the cost contortion for preventing the emergent behavior of the TSP agents can
|
||||
be found in ```marl_factory_grid/algorithms/tsp/contortions.py```.
|
||||
4. The functionalities that drive the emergence prevention mechanisms for the RL agents is mainly
|
||||
located in the utility functions ```get_ordered_dirt_piles (line 91)``` (for solving the emergence in the
|
||||
coin-quadrant environment) and ```distribute_indices (line 165)``` (mechanism for two_doors), that are part of
|
||||
```marl_factory_grid/algorithms/rl/utils.py```
|
||||
|
||||
Existing modules include a variety of functionalities within the environment:
|
||||
|
||||
- [Agents](marl_factory_grid/algorithms) implement either static strategies or learning algorithms based on the specific
|
||||
configuration.
|
||||
- Their action set includes opening [door entities](marl_factory_grid/modules/doors/entitites.py), cleaning
|
||||
[dirt](marl_factory_grid/modules/clean_up/entitites.py), picking
|
||||
up [items](marl_factory_grid/modules/items/entitites.py) and
|
||||
delivering them to designated drop-off locations.
|
||||
- Agents are equipped with a [battery](marl_factory_grid/modules/batteries/entitites.py) that gradually depletes over
|
||||
time if not charged at a chargepod.
|
||||
- The [maintainer](marl_factory_grid/modules/maintenance/entities.py) aims to
|
||||
repair [machines](marl_factory_grid/modules/machines/entitites.py) that lose health over time.
|
||||
|
||||
## Customization
|
||||
|
||||
If you plan on modifying the environment by for example adding entities or rules, use ``quickstart_modify``.
|
||||
This creates a template module and a script that runs an agent, incorporating the generated module.
|
||||
More information on how to modify the levels, entities, groups, rules and assets
|
||||
goto [modifications](docs/source/modifications.rst).
|
||||
|
||||
### Levels
|
||||
|
||||
Varying levels are created by defining Walls, Floor or Doors in *.txt*-files (see [levels](marl_factory_grid/levels) for
|
||||
examples).
|
||||
Define which *level* to use in your *configfile* as:
|
||||
|
||||
```yaml
|
||||
General:
|
||||
level_name: rooms # 'double', 'large', 'simple', ...
|
||||
```
|
||||
|
||||
... or create your own , maybe with the help of [asciiflow.com](https://asciiflow.com/#/).
|
||||
Make sure to use `#` as [Walls](marl_factory_grid/environment/entity/wall.py), `-` as free (walkable) floor, `D`
|
||||
for [Doors](./modules/doors/entities.py).
|
||||
Other Entites (define you own) may bring their own `Symbols`
|
||||
|
||||
### Entites
|
||||
|
||||
Entites are [Objects](marl_factory_grid/environment/entity/object.py) that can additionally be assigned a position.
|
||||
Abstract Entities are provided.
|
||||
|
||||
### Groups
|
||||
|
||||
[Groups](marl_factory_grid/environment/groups/objects.py) are entity Sets that provide administrative access to all
|
||||
group members.
|
||||
All [Entites](marl_factory_grid/environment/entity/global_entities.py) are available at runtime as EnvState property.
|
||||
|
||||
### Rules
|
||||
|
||||
[Rules](marl_factory_grid/environment/entity/object.py) define how the environment behaves on microscale.
|
||||
Each of the hookes (`on_init`, `pre_step`, `on_step`, '`post_step`', `on_done`)
|
||||
provide env-access to implement custom logic, calculate rewards, or gather information.
|
||||
|
||||

|
||||
|
||||
[Results](marl_factory_grid/environment/entity/object.py) provide a way to return `rule` evaluations such as rewards and
|
||||
state reports back to the environment.
|
||||
|
||||
### Assets
|
||||
|
||||
Make sure to bring your own assets for each Entity living in the Gridworld as the `Renderer` relies on it.
|
||||
PNG-files (transparent background) of square aspect-ratio should do the job, in general.
|
||||
|
||||
<img src="/marl_factory_grid/environment/assets/wall.png" width="5%">
|
||||
<!--suppress HtmlUnknownAttribute -->
|
||||
<html      html>
|
||||
<img src="/marl_factory_grid/environment/assets/agent/agent.png" width="5%">
|
||||
|
||||
|
84
main.py
Normal file
84
main.py
Normal file
@ -0,0 +1,84 @@
|
||||
from marl_factory_grid.algorithms.rl.RL_runner import rerun_dirt_quadrant_agent1_training, \
|
||||
rerun_two_rooms_agent1_training, rerun_two_rooms_agent2_training, dirt_quadrant_multi_agent_rl_eval, \
|
||||
two_rooms_multi_agent_rl_eval
|
||||
from marl_factory_grid.algorithms.tsp.TSP_runner import dirt_quadrant_multi_agent_tsp_eval, \
|
||||
two_rooms_multi_agent_tsp_eval
|
||||
|
||||
|
||||
###### Coin-quadrant environment ######
|
||||
def coin_quadrant_single_agent_training():
|
||||
""" Rerun training of RL-agent in coins_quadrant (dirt_quadrant) environment.
|
||||
The trained model and additional training metrics are saved in the study_out folder. """
|
||||
rerun_dirt_quadrant_agent1_training()
|
||||
|
||||
|
||||
def coin_quadrant_RL_multi_agent_eval_emergent():
|
||||
""" Rerun multi-agent evaluation of RL-agents in coins_quadrant (dirt_quadrant)
|
||||
environment, with occurring emergent phenomenon. Evaluation takes trained models
|
||||
from study_out/run0 for both agents."""
|
||||
dirt_quadrant_multi_agent_rl_eval(emergent_phenomenon=True)
|
||||
|
||||
|
||||
def coin_quadrant_RL_multi_agent_eval_prevented():
|
||||
""" Rerun multi-agent evaluation of RL-agents in coins_quadrant (dirt_quadrant)
|
||||
environment, with emergence prevention mechanism. Evaluation takes trained models
|
||||
from study_out/run0 for both agents."""
|
||||
dirt_quadrant_multi_agent_rl_eval(emergent_phenomenon=False)
|
||||
|
||||
|
||||
def coin_quadrant_TSP_multi_agent_eval_emergent():
|
||||
""" Rerun multi-agent evaluation of TSP-agents in coins_quadrant (dirt_quadrant)
|
||||
environment, with occurring emergent phenomenon. """
|
||||
dirt_quadrant_multi_agent_tsp_eval(emergent_phenomenon=True)
|
||||
|
||||
|
||||
def coin_quadrant_TSP_multi_agent_eval_prevented():
|
||||
""" Rerun multi-agent evaluation of TSP-agents in coins_quadrant (dirt_quadrant)
|
||||
environment, with emergence prevention mechanism. """
|
||||
dirt_quadrant_multi_agent_tsp_eval(emergent_phenomenon=False)
|
||||
|
||||
|
||||
###### Two-rooms environment ######
|
||||
|
||||
def two_rooms_agent1_training():
|
||||
""" Rerun training of left RL-agent in two_rooms environment.
|
||||
The trained model and additional training metrics are saved in the study_out folder. """
|
||||
rerun_two_rooms_agent1_training()
|
||||
|
||||
|
||||
def two_rooms_agent2_training():
|
||||
""" Rerun training of right RL-agent in two_rooms environment.
|
||||
The trained model and additional training metrics are saved in the study_out folder. """
|
||||
rerun_two_rooms_agent2_training()
|
||||
|
||||
|
||||
def two_rooms_RL_multi_agent_eval_emergent():
|
||||
""" Rerun multi-agent evaluation of RL-agents in two_rooms environment, with
|
||||
occurring emergent phenomenon. Evaluation takes trained models
|
||||
from study_out/run1 for agent1 and study_out/run2 for agent2. """
|
||||
two_rooms_multi_agent_rl_eval(emergent_phenomenon=True)
|
||||
|
||||
|
||||
def two_rooms_RL_multi_agent_eval_prevented():
|
||||
""" Rerun multi-agent evaluation of RL-agents in two_rooms environment, with
|
||||
emergence prevention mechanism. Evaluation takes trained models
|
||||
from study_out/run1 for agent1 and study_out/run2 for agent2. """
|
||||
two_rooms_multi_agent_rl_eval(emergent_phenomenon=False)
|
||||
|
||||
|
||||
def two_rooms_TSP_multi_agent_eval_emergent():
|
||||
""" Rerun multi-agent evaluation of TSP-agents in two_rooms environment, with
|
||||
occurring emergent phenomenon. """
|
||||
two_rooms_multi_agent_tsp_eval(emergent_phenomenon=True)
|
||||
|
||||
|
||||
def two_rooms_TSP_multi_agent_eval_prevented():
|
||||
""" Rerun multi-agent evaluation of TSP-agents in two_rooms environment, with
|
||||
emergence prevention mechanism. """
|
||||
two_rooms_multi_agent_tsp_eval(emergent_phenomenon=False)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
# Select any of the above functions to rerun the respective part
|
||||
# from our evaluation section of the paper
|
||||
coin_quadrant_RL_multi_agent_eval_prevented()
|
@ -3,9 +3,10 @@ from marl_factory_grid.algorithms.rl.a2c_dirt import A2C
|
||||
from marl_factory_grid.algorithms.utils import load_yaml_file
|
||||
|
||||
|
||||
def dirt_quadrant_agent1_training():
|
||||
train_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/dirt_quadrant_train_config.yaml')
|
||||
eval_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/dirt_quadrant_eval_config.yaml')
|
||||
####### Training routines ######
|
||||
def rerun_dirt_quadrant_agent1_training():
|
||||
train_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/single_agent_configs/dirt_quadrant_train_config.yaml')
|
||||
eval_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/single_agent_configs/dirt_quadrant_eval_config.yaml')
|
||||
train_cfg = load_yaml_file(train_cfg_path)
|
||||
eval_cfg = load_yaml_file(eval_cfg_path)
|
||||
|
||||
@ -17,8 +18,8 @@ def dirt_quadrant_agent1_training():
|
||||
|
||||
|
||||
def two_rooms_training(max_steps, agent_name):
|
||||
train_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/two_rooms_train_config.yaml')
|
||||
eval_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/two_rooms_eval_config.yaml')
|
||||
train_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/single_agent_configs/two_rooms_train_config.yaml')
|
||||
eval_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/single_agent_configs/two_rooms_eval_config.yaml')
|
||||
train_cfg = load_yaml_file(train_cfg_path)
|
||||
eval_cfg = load_yaml_file(eval_cfg_path)
|
||||
|
||||
@ -32,14 +33,15 @@ def two_rooms_training(max_steps, agent_name):
|
||||
agent.eval_loop(n_episodes=1)
|
||||
|
||||
|
||||
def two_rooms_agent1_training():
|
||||
def rerun_two_rooms_agent1_training():
|
||||
two_rooms_training(max_steps=190000, agent_name="agent1")
|
||||
|
||||
|
||||
def two_rooms_agent2_training():
|
||||
def rerun_two_rooms_agent2_training():
|
||||
two_rooms_training(max_steps=260000, agent_name="agent2")
|
||||
|
||||
|
||||
####### Eval routines ########
|
||||
def single_agent_eval(config_name, run_folder_name):
|
||||
eval_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/{config_name}_eval_config.yaml')
|
||||
train_cfg = eval_cfg = load_yaml_file(eval_cfg_path)
|
||||
@ -52,7 +54,7 @@ def single_agent_eval(config_name, run_folder_name):
|
||||
|
||||
|
||||
def multi_agent_eval(config_name, runs, emergent_phenomenon=False):
|
||||
eval_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/multi_agent_configs/{config_name}' +
|
||||
eval_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/multi_agent_configs/{config_name}' +
|
||||
f'_eval_config{"_emergent" if emergent_phenomenon else ""}.yaml')
|
||||
eval_cfg = load_yaml_file(eval_cfg_path)
|
||||
|
||||
@ -63,13 +65,9 @@ def multi_agent_eval(config_name, runs, emergent_phenomenon=False):
|
||||
agent.eval_loop(1)
|
||||
|
||||
|
||||
def dirt_quadrant_multi_agent_ctde_eval(emergent_phenomenon):
|
||||
def dirt_quadrant_multi_agent_rl_eval(emergent_phenomenon):
|
||||
multi_agent_eval("dirt_quadrant", ["run0", "run0"], emergent_phenomenon)
|
||||
|
||||
|
||||
def two_rooms_multi_agent_eval(emergent_phenomenon):
|
||||
multi_agent_eval("two_rooms", ["run1", "run2"], emergent_phenomenon)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
dirt_quadrant_agent1_training()
|
||||
def two_rooms_multi_agent_rl_eval(emergent_phenomenon):
|
||||
multi_agent_eval("two_rooms", ["run1", "run2"], emergent_phenomenon)
|
@ -2,13 +2,14 @@ import os
|
||||
import torch
|
||||
from typing import Union, List
|
||||
import numpy as np
|
||||
from tqdm import tqdm
|
||||
|
||||
from marl_factory_grid.algorithms.rl.base_a2c import PolicyGradient, cumulate_discount
|
||||
from marl_factory_grid.algorithms.rl.constants import Names
|
||||
from marl_factory_grid.algorithms.rl.utils import transform_observations, _as_torch, door_is_close, \
|
||||
from marl_factory_grid.algorithms.rl.utils import transform_observations, _as_torch, is_door_close, \
|
||||
get_dirt_piles_positions, update_target_pile, update_ordered_dirt_piles, get_all_cleaned_dirt_piles, \
|
||||
distribute_indices, set_agent_spawnpoint, get_ordered_dirt_piles, handle_finished_episode, save_configs, \
|
||||
save_agent_models, get_all_observations
|
||||
distribute_indices, set_agents_spawnpoints, get_ordered_dirt_piles, handle_finished_episode, save_configs, \
|
||||
save_agent_models, get_all_observations, get_agents_positions
|
||||
from marl_factory_grid.algorithms.utils import add_env_props
|
||||
from marl_factory_grid.utils.plotting.plot_single_runs import plot_action_maps, plot_reward_development, \
|
||||
create_info_maps
|
||||
@ -28,93 +29,88 @@ class A2C:
|
||||
self.n_agents = train_cfg[nms.ENV][nms.N_AGENTS]
|
||||
self.setup()
|
||||
self.reward_development = []
|
||||
self.action_probabilities = {agent_idx:[] for agent_idx in range(self.n_agents)}
|
||||
self.action_probabilities = {agent_idx: [] for agent_idx in range(self.n_agents)}
|
||||
|
||||
def setup(self):
|
||||
dirt_piles_positions = [self.factory.state.entities[nms.DIRT_PILES][pile_idx].pos for pile_idx in
|
||||
range(len(self.factory.state.entities[nms.DIRT_PILES]))]
|
||||
self.obs_dim = 2 + 2*len(dirt_piles_positions) if self.cfg[nms.ALGORITHM][nms.PILE_OBSERVABILITY] == nms.ALL else 4
|
||||
""" Initialize agents and create entry for run results according to configuration """
|
||||
self.obs_dim = 2 + 2 * len(get_dirt_piles_positions(self.factory)) if self.cfg[nms.ALGORITHM][
|
||||
nms.PILE_OBSERVABILITY] == nms.ALL else 4
|
||||
self.act_dim = 4 # The 4 movement directions
|
||||
self.agents = [PolicyGradient(self.factory, agent_id=i, obs_dim=self.obs_dim, act_dim=self.act_dim) for i in range(self.n_agents)]
|
||||
self.agents = [PolicyGradient(self.factory, agent_id=i, obs_dim=self.obs_dim, act_dim=self.act_dim) for i in
|
||||
range(self.n_agents)]
|
||||
|
||||
if self.cfg[nms.ENV][nms.SAVE_AND_LOG]:
|
||||
# Create results folder
|
||||
runs = os.listdir("../study_out/")
|
||||
runs = os.listdir("./study_out/")
|
||||
run_numbers = [int(run[3:]) for run in runs if run[:3] == "run"]
|
||||
next_run_number = max(run_numbers)+1 if run_numbers else 0
|
||||
self.results_path = f"../study_out/run{next_run_number}"
|
||||
next_run_number = max(run_numbers) + 1 if run_numbers else 0
|
||||
self.results_path = f"./study_out/run{next_run_number}"
|
||||
os.mkdir(self.results_path)
|
||||
# Save settings in results folder
|
||||
save_configs(self.results_path, self.cfg, self.factory.conf, self.eval_factory.conf)
|
||||
|
||||
def set_cfg(self, eval=False):
|
||||
""" Set the mode of the current configuration """
|
||||
if eval:
|
||||
self.cfg = self.eval_cfg
|
||||
else:
|
||||
self.cfg = self.train_cfg
|
||||
|
||||
def load_agents(self, runs_list):
|
||||
""" Initialize networks with parameters of already trained agents """
|
||||
for idx, run in enumerate(runs_list):
|
||||
run_path = f"../study_out/{run}"
|
||||
run_path = f"./study_out/{run}"
|
||||
self.agents[idx].pi.load_model_parameters(f"{run_path}/PolicyNet_model_parameters.pth")
|
||||
self.agents[idx].vf.load_model_parameters(f"{run_path}/ValueNet_model_parameters.pth")
|
||||
|
||||
@torch.no_grad()
|
||||
def train_loop(self):
|
||||
""" Function for training agents """
|
||||
env = self.factory
|
||||
n_steps, max_steps = [self.cfg[nms.ALGORITHM][k] for k in [nms.N_STEPS, nms.MAX_STEPS]]
|
||||
global_steps, episode = 0, 0
|
||||
indices = distribute_indices(env, self.cfg, self.n_agents)
|
||||
dirt_piles_positions = get_dirt_piles_positions(env)
|
||||
used_actions = {i:0 for i in range(len(env.state.entities[nms.AGENT][0]._actions))} # Assume both agents have the same actions
|
||||
target_pile = [partition[0] for partition in indices] # pointer that points to the target pile for each agent. (point to same pile, point to different piles)
|
||||
cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)] # Have own dictionary for each agent
|
||||
target_pile = [partition[0] for partition in
|
||||
indices] # list of pointers that point to the current target pile for each agent
|
||||
cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]
|
||||
|
||||
pbar = tqdm(total=max_steps)
|
||||
while global_steps < max_steps:
|
||||
print(global_steps)
|
||||
obs = env.reset()
|
||||
_ = env.reset()
|
||||
if self.cfg[nms.ENV][nms.TRAIN_RENDER]:
|
||||
env.render()
|
||||
set_agent_spawnpoint(env, self.n_agents)
|
||||
set_agents_spawnpoints(env, self.n_agents)
|
||||
ordered_dirt_piles = get_ordered_dirt_piles(env, cleaned_dirt_piles, self.cfg, self.n_agents)
|
||||
# Reset current target pile at episode begin if all piles have to be cleaned in one episode
|
||||
if self.cfg[nms.ALGORITHM][nms.PILE_ALL_DONE] == nms.ALL:
|
||||
target_pile = [partition[0] for partition in indices]
|
||||
cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]
|
||||
|
||||
# Supply each agent with its local observation
|
||||
obs = transform_observations(env, ordered_dirt_piles, target_pile, self.cfg, self.n_agents)
|
||||
done, rew_log = [False] * self.n_agents, 0
|
||||
|
||||
print("Agents spawnpoints:", [env.state.moving_entites[agent_idx].pos for agent_idx in range(self.n_agents)])
|
||||
print("Agents target piles:", target_pile)
|
||||
print("Agents initial observation:", obs)
|
||||
print("Agents cleaned dirt piles:", cleaned_dirt_piles)
|
||||
done, rew_log = [False] * self.n_agents, 0
|
||||
|
||||
while not all(done):
|
||||
# 0="North", 1="East", 2="South", 3="West", 4="Clean", 5="Noop"
|
||||
action = self.use_door_or_move(env, obs, cleaned_dirt_piles) \
|
||||
if nms.DOORS in env.state.entities.keys() else self.get_actions(obs)
|
||||
used_actions[int(action[0])] += 1
|
||||
_, next_obs, reward, done, info = env.step(action)
|
||||
if done:
|
||||
print("DoneAtMaxStepsReached:", len(self.agents[0]._episode))
|
||||
next_obs = transform_observations(env, ordered_dirt_piles, target_pile, self.cfg, self.n_agents)
|
||||
|
||||
reward, done = self.handle_dirt(env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices, reward, done)
|
||||
# Handle case where agent is on field with dirt
|
||||
reward, done = self.handle_dirt(env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices,
|
||||
reward, done)
|
||||
|
||||
if n_steps != 0 and (global_steps + 1) % n_steps == 0:
|
||||
print("max_steps reached")
|
||||
done = True
|
||||
if n_steps != 0 and (global_steps + 1) % n_steps == 0: done = True
|
||||
|
||||
done = [done] * self.n_agents if isinstance(done, bool) else done
|
||||
for ag_i, agent in enumerate(self.agents):
|
||||
# For forced actions like door opening, we have to call the step function with this action, but
|
||||
# since we are not allowed to exceed the dimensions range, we can't log the corresponding step info.
|
||||
if action[ag_i] in range(self.act_dim):
|
||||
# Add agent results into respective rollout buffers
|
||||
agent._episode[-1] = (next_obs[ag_i], action[ag_i], reward[ag_i], agent._episode[-1][-1])
|
||||
|
||||
if self.cfg[nms.ENV][nms.TRAIN_RENDER]:
|
||||
env.render()
|
||||
# Visualize state update
|
||||
if self.cfg[nms.ENV][nms.TRAIN_RENDER]: env.render()
|
||||
|
||||
obs = next_obs
|
||||
|
||||
@ -123,97 +119,93 @@ class A2C:
|
||||
global_steps += 1
|
||||
rew_log += sum(reward)
|
||||
|
||||
if global_steps >= max_steps:
|
||||
break
|
||||
if global_steps >= max_steps: break
|
||||
|
||||
print(f'reward at episode: {episode} = {rew_log}')
|
||||
self.reward_development.append(rew_log)
|
||||
episode += 1
|
||||
pbar.update(global_steps - pbar.n)
|
||||
|
||||
plot_reward_development(self.reward_development, self.cfg, self.results_path)
|
||||
pbar.close()
|
||||
if self.cfg[nms.ENV][nms.SAVE_AND_LOG]:
|
||||
create_info_maps(env, used_actions, get_all_observations(env, self.cfg, self.n_agents),
|
||||
plot_reward_development(self.reward_development, self.results_path)
|
||||
create_info_maps(env, get_all_observations(env, self.cfg, self.n_agents),
|
||||
get_dirt_piles_positions(env), self.results_path, self.agents, self.act_dim, self)
|
||||
save_agent_models(self.results_path, self.agents)
|
||||
plot_action_maps(env, [self], self.results_path)
|
||||
|
||||
@torch.inference_mode(True)
|
||||
def eval_loop(self, n_episodes, render=False):
|
||||
def eval_loop(self, n_episodes):
|
||||
""" Function for performing inference """
|
||||
env = self.eval_factory
|
||||
self.set_cfg(eval=True)
|
||||
episode, results = 0, []
|
||||
dirt_piles_positions = get_dirt_piles_positions(env)
|
||||
indices = distribute_indices(env, self.cfg, self.n_agents)
|
||||
target_pile = [partition[0] for partition in indices] # pointer that points to the target pile for each agent. (point to same pile/ point to different piles)
|
||||
target_pile = [partition[0] for partition in
|
||||
indices] # list of pointers that point to the current target pile for each agent
|
||||
if self.cfg[nms.ALGORITHM][nms.PILE_ALL_DONE] == nms.DISTRIBUTED:
|
||||
cleaned_dirt_piles = [{dirt_piles_positions[idx]: False for idx in indices[i]} for i in range(self.n_agents)]
|
||||
else:
|
||||
cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]
|
||||
cleaned_dirt_piles = [{dirt_piles_positions[idx]: False for idx in indices[i]} for i in
|
||||
range(self.n_agents)]
|
||||
else: cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]
|
||||
|
||||
while episode < n_episodes:
|
||||
obs = env.reset()
|
||||
set_agent_spawnpoint(env, self.n_agents)
|
||||
_ = env.reset()
|
||||
set_agents_spawnpoints(env, self.n_agents)
|
||||
if self.cfg[nms.ENV][nms.EVAL_RENDER]:
|
||||
# Don't render auxiliary piles
|
||||
if self.cfg[nms.ALGORITHM][nms.AUXILIARY_PILES]:
|
||||
# Don't render auxiliary piles
|
||||
auxiliary_piles = [pile for idx, pile in enumerate(env.state.entities[nms.DIRT_PILES]) if idx % 2 == 0]
|
||||
auxiliary_piles = [pile for idx, pile in enumerate(env.state.entities[nms.DIRT_PILES]) if
|
||||
idx % 2 == 0]
|
||||
for pile in auxiliary_piles:
|
||||
pile.set_new_amount(0)
|
||||
env.render()
|
||||
env._renderer.fps = 5 # Slow down agent movement
|
||||
env._renderer.fps = 5 # Slow down agent movement
|
||||
|
||||
# Reset current target pile at episode begin if all piles have to be cleaned in one episode
|
||||
if self.cfg[nms.ALGORITHM][nms.PILE_ALL_DONE] in [nms.ALL, nms.DISTRIBUTED, nms.SHARED]:
|
||||
target_pile = [partition[0] for partition in indices]
|
||||
if self.cfg[nms.ALGORITHM][nms.PILE_ALL_DONE] == nms.DISTRIBUTED:
|
||||
cleaned_dirt_piles = [{dirt_piles_positions[idx]: False for idx in indices[i]} for i in range(self.n_agents)]
|
||||
else:
|
||||
cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]
|
||||
cleaned_dirt_piles = [{dirt_piles_positions[idx]: False for idx in indices[i]} for i in
|
||||
range(self.n_agents)]
|
||||
else: cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]
|
||||
|
||||
ordered_dirt_piles = get_ordered_dirt_piles(env, cleaned_dirt_piles, self.cfg, self.n_agents)
|
||||
|
||||
# Supply each agent with its local observation
|
||||
obs = transform_observations(env, ordered_dirt_piles, target_pile, self.cfg, self.n_agents)
|
||||
done, rew_log, eps_rew = [False] * self.n_agents, 0, torch.zeros(self.n_agents)
|
||||
|
||||
while not all(done):
|
||||
action = self.use_door_or_move(env, obs, cleaned_dirt_piles, det=True) \
|
||||
if nms.DOORS in env.state.entities.keys() else self.execute_policy(obs, env, cleaned_dirt_piles) # zero exploration
|
||||
_, next_obs, reward, done, info = env.step(action) # Note that this call seems to flip the lists in indices
|
||||
if done:
|
||||
print("DoneAtMaxStepsReached:", len(self.agents[0]._episode))
|
||||
if nms.DOORS in env.state.entities.keys() else self.execute_policy(obs, env,
|
||||
cleaned_dirt_piles) # zero exploration
|
||||
_, next_obs, reward, done, info = env.step(action)
|
||||
|
||||
# Add small negative reward if agent has moved away from the target_pile
|
||||
# reward = self.reward_distance(env, obs, target_pile, reward)
|
||||
# Handle case where agent is on field with dirt
|
||||
reward, done = self.handle_dirt(env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices,
|
||||
reward, done)
|
||||
|
||||
# Check and handle if agent is on field with dirt
|
||||
reward, done = self.handle_dirt(env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices, reward, done)
|
||||
|
||||
# Get transformed next_obs that might have been updated because of self.handle_dirt.
|
||||
# For eval, where pile_all_done is "all", it's mandatory that the potential change of the target pile
|
||||
# in the observation, caused by self.handle_dirt, is already considered when the next action is calculated.
|
||||
# Get transformed next_obs that might have been updated because of handle_dirt
|
||||
next_obs = transform_observations(env, ordered_dirt_piles, target_pile, self.cfg, self.n_agents)
|
||||
|
||||
done = [done] * self.n_agents if isinstance(done, bool) else done
|
||||
|
||||
if self.cfg[nms.ENV][nms.EVAL_RENDER]:
|
||||
env.render()
|
||||
if self.cfg[nms.ENV][nms.EVAL_RENDER]: env.render()
|
||||
|
||||
obs = next_obs
|
||||
|
||||
episode += 1
|
||||
|
||||
|
||||
|
||||
########## Helper functions ########
|
||||
|
||||
def get_actions(self, observations) -> ListOrTensor:
|
||||
# Given an observation, get actions for both agents
|
||||
""" Given local observations, get actions for both agents """
|
||||
actions = [agent.step(_as_torch(observations[ag_i]).view(-1).to(torch.float32)) for ag_i, agent in
|
||||
enumerate(self.agents)]
|
||||
return actions
|
||||
|
||||
def execute_policy(self, observations, env, cleaned_dirt_piles) -> ListOrTensor:
|
||||
# Use deterministic policy for inference
|
||||
""" Execute agent policies deterministically for inference """
|
||||
actions = [agent.policy(_as_torch(observations[ag_i]).view(-1).to(torch.float32)) for ag_i, agent in
|
||||
enumerate(self.agents)]
|
||||
for agent_idx in range(self.n_agents):
|
||||
@ -224,10 +216,11 @@ class A2C:
|
||||
return actions
|
||||
|
||||
def use_door_or_move(self, env, obs, cleaned_dirt_piles, det=False):
|
||||
""" Function that handles automatic actions like door opening and forced Noop"""
|
||||
action = []
|
||||
for agent_idx, agent in enumerate(self.agents):
|
||||
agent_obs = _as_torch((obs)[agent_idx]).view(-1).to(torch.float32)
|
||||
# If agent already reached its target
|
||||
# Use Noop operation if agent already reached its target. (Only relevant for two-rooms setting)
|
||||
if all(cleaned_dirt_piles[agent_idx].values()):
|
||||
action.append(next(action_i for action_i, a in enumerate(env.state[nms.AGENT][agent_idx].actions) if
|
||||
a.name == nms.NOOP))
|
||||
@ -235,37 +228,33 @@ class A2C:
|
||||
# Include agent experience entry manually
|
||||
agent._episode.append((None, None, None, agent.vf(agent_obs)))
|
||||
else:
|
||||
if door := door_is_close(env, agent_idx):
|
||||
if door := is_door_close(env, agent_idx):
|
||||
if door.is_closed:
|
||||
action.append(next(
|
||||
action_i for action_i, a in enumerate(env.state[nms.AGENT][agent_idx].actions) if
|
||||
a.name == nms.USE_DOOR))
|
||||
# Don't include action in agent experience
|
||||
else:
|
||||
if det:
|
||||
action.append(int(agent.pi(agent_obs, det=True)[0]))
|
||||
else:
|
||||
action.append(int(agent.step(agent_obs)))
|
||||
if det: action.append(int(agent.pi(agent_obs, det=True)[0]))
|
||||
else: action.append(int(agent.step(agent_obs)))
|
||||
else:
|
||||
if det:
|
||||
action.append(int(agent.pi(agent_obs, det=True)[0]))
|
||||
else:
|
||||
action.append(int(agent.step(agent_obs)))
|
||||
if det: action.append(int(agent.pi(agent_obs, det=True)[0]))
|
||||
else: action.append(int(agent.step(agent_obs)))
|
||||
return action
|
||||
|
||||
def handle_dirt(self, env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices, reward, done):
|
||||
# Check if agent moved on field with dirt. If that is the case collect dirt automatically
|
||||
agent_positions = [env.state.moving_entites[agent_idx].pos for agent_idx in range(self.n_agents)]
|
||||
""" Check if agent moved on field with dirt. If that is the case collect dirt automatically """
|
||||
agents_positions = get_agents_positions(env, self.n_agents)
|
||||
dirt_piles_positions = get_dirt_piles_positions(env)
|
||||
if any([True for pos in agent_positions if pos in dirt_piles_positions]):
|
||||
if any([True for pos in agents_positions if pos in dirt_piles_positions]):
|
||||
# Only simulate collecting the dirt
|
||||
for idx, pos in enumerate(agent_positions):
|
||||
for idx, pos in enumerate(agents_positions):
|
||||
if pos in cleaned_dirt_piles[idx].keys() and not cleaned_dirt_piles[idx][pos]:
|
||||
|
||||
# If dirt piles should be cleaned in a specific order
|
||||
if ordered_dirt_piles[idx]:
|
||||
if pos == ordered_dirt_piles[idx][target_pile[idx]]:
|
||||
reward[idx] += 50 # 1
|
||||
reward[idx] += 50
|
||||
cleaned_dirt_piles[idx][pos] = True
|
||||
# Set pointer to next dirt pile
|
||||
update_target_pile(env, idx, target_pile, indices, self.cfg)
|
||||
@ -278,7 +267,7 @@ class A2C:
|
||||
for pos in dirt_piles_positions:
|
||||
cleaned_dirt_piles[idx][pos] = False
|
||||
else:
|
||||
reward[idx] += 50 # 1
|
||||
reward[idx] += 50
|
||||
cleaned_dirt_piles[idx][pos] = True
|
||||
|
||||
# Indicate that renderer can hide dirt pile
|
||||
@ -294,4 +283,3 @@ class A2C:
|
||||
done = True
|
||||
|
||||
return reward, done
|
||||
|
||||
|
@ -10,6 +10,7 @@ from marl_factory_grid.algorithms.rl.constants import Names
|
||||
nms = Names
|
||||
|
||||
def _as_torch(x):
|
||||
""" Helper function to convert different list types to a torch tensor """
|
||||
if isinstance(x, np.ndarray):
|
||||
return torch.from_numpy(x)
|
||||
elif isinstance(x, List):
|
||||
@ -20,15 +21,16 @@ def _as_torch(x):
|
||||
|
||||
|
||||
def transform_observations(env, ordered_dirt_piles, target_pile, cfg, n_agents):
|
||||
""" Requires that agent has observations -DirtPiles and -Self """
|
||||
agent_positions = [env.state.moving_entites[agent_idx].pos for agent_idx in range(n_agents)]
|
||||
""" Function that extracts local observations from global state
|
||||
Requires that agents have observations -DirtPiles and -Self (cf. environment configs) """
|
||||
agents_positions = get_agents_positions(env, n_agents)
|
||||
pile_observability_is_all = cfg[nms.ALGORITHM][nms.PILE_OBSERVABILITY] == nms.ALL
|
||||
if pile_observability_is_all:
|
||||
trans_obs = [torch.zeros(2+2*len(ordered_dirt_piles[0])) for _ in range(len(agent_positions))]
|
||||
trans_obs = [torch.zeros(2+2*len(ordered_dirt_piles[0])) for _ in range(len(agents_positions))]
|
||||
else:
|
||||
# Only show current target pile
|
||||
trans_obs = [torch.zeros(4) for _ in range(len(agent_positions))]
|
||||
for i, pos in enumerate(agent_positions):
|
||||
trans_obs = [torch.zeros(4) for _ in range(len(agents_positions))]
|
||||
for i, pos in enumerate(agents_positions):
|
||||
agent_x, agent_y = pos[0], pos[1]
|
||||
trans_obs[i][0] = agent_x
|
||||
trans_obs[i][1] = agent_y
|
||||
@ -45,6 +47,7 @@ def transform_observations(env, ordered_dirt_piles, target_pile, cfg, n_agents):
|
||||
|
||||
|
||||
def get_all_observations(env, cfg, n_agents):
|
||||
""" Helper function that returns all possible agent observations """
|
||||
dirt_piles_positions = [env.state.entities[nms.DIRT_PILES][pile_idx].pos for pile_idx in
|
||||
range(len(env.state.entities[nms.DIRT_PILES]))]
|
||||
if cfg[nms.ALGORITHM][nms.PILE_OBSERVABILITY] == nms.ALL:
|
||||
@ -76,41 +79,48 @@ def get_all_observations(env, cfg, n_agents):
|
||||
|
||||
|
||||
def get_dirt_piles_positions(env):
|
||||
""" Get positions of dirt piles on the map """
|
||||
return [env.state.entities[nms.DIRT_PILES][pile_idx].pos for pile_idx in range(len(env.state.entities[nms.DIRT_PILES]))]
|
||||
|
||||
|
||||
def get_agents_positions(env, n_agents):
|
||||
""" Get positions of agents on the map """
|
||||
return [env.state.moving_entites[agent_idx].pos for agent_idx in range(n_agents)]
|
||||
|
||||
|
||||
def get_ordered_dirt_piles(env, cleaned_dirt_piles, cfg, n_agents):
|
||||
""" Each agent can have its individual pile order """
|
||||
""" This function determines in which order the agents should clean the dirt piles
|
||||
Each agent can have its individual pile order """
|
||||
ordered_dirt_piles = [[] for _ in range(n_agents)]
|
||||
dirt_pile_positions = get_dirt_piles_positions(env)
|
||||
agent_positions = [env.state.moving_entites[agent_idx].pos for agent_idx in range(n_agents)]
|
||||
dirt_piles_positions = get_dirt_piles_positions(env)
|
||||
agents_positions = get_agents_positions(env, n_agents)
|
||||
for agent_idx in range(n_agents):
|
||||
if cfg[nms.ALGORITHM][nms.PILE_ORDER] in [nms.FIXED, nms.AGENTS]:
|
||||
ordered_dirt_piles[agent_idx] = dirt_pile_positions
|
||||
ordered_dirt_piles[agent_idx] = dirt_piles_positions
|
||||
elif cfg[nms.ALGORITHM][nms.PILE_ORDER] in [nms.SMART, nms.DYNAMIC]:
|
||||
# Calculate distances for remaining unvisited dirt piles
|
||||
remaining_target_piles = [pos for pos, value in cleaned_dirt_piles[agent_idx].items() if not value]
|
||||
pile_distances = {pos:0 for pos in remaining_target_piles}
|
||||
agent_pos = agent_positions[agent_idx]
|
||||
agent_pos = agents_positions[agent_idx]
|
||||
for pos in remaining_target_piles:
|
||||
pile_distances[pos] = np.abs(agent_pos[0] - pos[0]) + np.abs(agent_pos[1] - pos[1])
|
||||
|
||||
if cfg[nms.ALGORITHM][nms.PILE_ORDER] == nms.SMART:
|
||||
# Check if there is an agent in line with any of the remaining dirt piles
|
||||
# Check if there is an agent on the direct path to any of the remaining dirt piles
|
||||
for pile_pos in remaining_target_piles:
|
||||
for other_pos in agent_positions:
|
||||
for other_pos in agents_positions:
|
||||
if other_pos != agent_pos:
|
||||
if agent_pos[0] == other_pos[0] == pile_pos[0] or agent_pos[1] == other_pos[1] == pile_pos[1]:
|
||||
# Get the line between the agent and the goal
|
||||
# Get the line between the agent and the target
|
||||
path = bresenham(agent_pos[0], agent_pos[1], pile_pos[0], pile_pos[1])
|
||||
|
||||
# Check if the entity lies on the path between the agent and the goal
|
||||
# Check if the entity lies on the path between the agent and the target
|
||||
if other_pos in path:
|
||||
pile_distances[pile_pos] += np.abs(agent_pos[0] - other_pos[0]) + np.abs(agent_pos[1] - other_pos[1])
|
||||
|
||||
sorted_pile_distances = dict(sorted(pile_distances.items(), key=lambda item: item[1]))
|
||||
# Insert already visited dirt piles
|
||||
ordered_dirt_piles[agent_idx] = [pos for pos in dirt_pile_positions if pos not in remaining_target_piles]
|
||||
ordered_dirt_piles[agent_idx] = [pos for pos in dirt_piles_positions if pos not in remaining_target_piles]
|
||||
# Fill up with sorted positions
|
||||
for pos in sorted_pile_distances.keys():
|
||||
ordered_dirt_piles[agent_idx].append(pos)
|
||||
@ -145,6 +155,7 @@ def bresenham(x0, y0, x1, y1):
|
||||
|
||||
|
||||
def update_ordered_dirt_piles(agent_idx, cleaned_dirt_piles, ordered_dirt_piles, env, cfg, n_agents):
|
||||
""" Update the order of the remaining dirt piles """
|
||||
# Only update ordered_dirt_pile for agent that reached its target pile
|
||||
updated_ordered_dirt_piles = get_ordered_dirt_piles(env, cleaned_dirt_piles, cfg, n_agents)
|
||||
for i in range(len(ordered_dirt_piles[agent_idx])):
|
||||
@ -152,8 +163,10 @@ def update_ordered_dirt_piles(agent_idx, cleaned_dirt_piles, ordered_dirt_piles,
|
||||
|
||||
|
||||
def distribute_indices(env, cfg, n_agents):
|
||||
""" Distribute dirt piles evenly among the agents """
|
||||
indices = []
|
||||
n_dirt_piles = len(get_dirt_piles_positions(env))
|
||||
agents_positions = get_agents_positions(env, n_agents)
|
||||
if n_dirt_piles == 1 or cfg[nms.ALGORITHM][nms.PILE_ORDER] in [nms.FIXED, nms.DYNAMIC, nms.SMART]:
|
||||
indices = [[0] for _ in range(n_agents)]
|
||||
else:
|
||||
@ -171,12 +184,11 @@ def distribute_indices(env, cfg, n_agents):
|
||||
# -> Starting with index 0 even piles are auxiliary piles, odd piles are primary piles
|
||||
if cfg[nms.ALGORITHM][nms.AUXILIARY_PILES] and nms.DOORS in env.state.entities.keys():
|
||||
door_positions = [door.pos for door in env.state.entities[nms.DOORS]]
|
||||
agent_positions = [env.state.moving_entites[agent_idx].pos for agent_idx in range(n_agents)]
|
||||
distances = {door_pos:[] for door_pos in door_positions}
|
||||
|
||||
# Calculate distance of every agent to every door
|
||||
for door_pos in door_positions:
|
||||
for agent_pos in agent_positions:
|
||||
for agent_pos in agents_positions:
|
||||
distances[door_pos].append(np.abs(door_pos[0] - agent_pos[0]) + np.abs(door_pos[1] - agent_pos[1]))
|
||||
|
||||
def duplicate_indices(lst, item):
|
||||
@ -213,6 +225,7 @@ def distribute_indices(env, cfg, n_agents):
|
||||
|
||||
|
||||
def update_target_pile(env, agent_idx, target_pile, indices, cfg):
|
||||
""" Get the next target pile for a given agent """
|
||||
if cfg[nms.ALGORITHM][nms.PILE_ORDER] in [nms.FIXED, nms.DYNAMIC, nms.SMART]:
|
||||
if target_pile[agent_idx] + 1 < len(get_dirt_piles_positions(env)):
|
||||
target_pile[agent_idx] += 1
|
||||
@ -223,7 +236,8 @@ def update_target_pile(env, agent_idx, target_pile, indices, cfg):
|
||||
target_pile[agent_idx] += 1
|
||||
|
||||
|
||||
def door_is_close(env, agent_idx):
|
||||
def is_door_close(env, agent_idx):
|
||||
""" Checks whether the agent is close to a door """
|
||||
neighbourhood = [y for x in env.state.entities.neighboring_positions(env.state[nms.AGENT][agent_idx].pos)
|
||||
for y in env.state.entities.pos_dict[x] if nms.DOOR in y.name]
|
||||
if neighbourhood:
|
||||
@ -231,6 +245,7 @@ def door_is_close(env, agent_idx):
|
||||
|
||||
|
||||
def get_all_cleaned_dirt_piles(dirt_piles_positions, cleaned_dirt_piles, n_agents):
|
||||
""" Returns all dirt piles cleaned by any agent """
|
||||
meta_cleaned_dirt_piles = {pos: False for pos in dirt_piles_positions}
|
||||
for agent_idx in range(n_agents):
|
||||
for (pos, cleaned) in cleaned_dirt_piles[agent_idx].items():
|
||||
@ -240,6 +255,7 @@ def get_all_cleaned_dirt_piles(dirt_piles_positions, cleaned_dirt_piles, n_agent
|
||||
|
||||
|
||||
def handle_finished_episode(obs, agents, cfg):
|
||||
""" Finish up episode, calculate advantages and perform policy net and value net updates"""
|
||||
with torch.inference_mode(False):
|
||||
for ag_i, agent in enumerate(agents):
|
||||
# Get states, actions, rewards and values from rollout buffer
|
||||
@ -268,6 +284,7 @@ def handle_finished_episode(obs, agents, cfg):
|
||||
|
||||
|
||||
def split_into_chunks(data_tuple, cfg):
|
||||
""" Chunks episode data into approximately equal sized chunks to prevent system memory failure from overload """
|
||||
result = [data_tuple]
|
||||
chunk_size = cfg[nms.ALGORITHM][nms.CHUNK_EPISODE]
|
||||
if chunk_size > 0:
|
||||
@ -286,7 +303,8 @@ def split_into_chunks(data_tuple, cfg):
|
||||
return result
|
||||
|
||||
|
||||
def set_agent_spawnpoint(env, n_agents):
|
||||
def set_agents_spawnpoints(env, n_agents):
|
||||
""" Tell environment where the agents should spawn in the next episode """
|
||||
for agent_idx in range(n_agents):
|
||||
agent_name = list(env.state.agents_conf.keys())[agent_idx]
|
||||
current_pos_pointer = env.state.agents_conf[agent_name][nms.POS_POINTER]
|
||||
@ -299,6 +317,7 @@ def set_agent_spawnpoint(env, n_agents):
|
||||
|
||||
|
||||
def save_configs(results_path, cfg, factory_conf, eval_factory_conf):
|
||||
""" Save configurations for logging purposes """
|
||||
with open(f"{results_path}/MARL_config.txt", "w") as txt_file:
|
||||
txt_file.write(str(cfg))
|
||||
with open(f"{results_path}/train_env_config.txt", "w") as txt_file:
|
||||
@ -308,6 +327,7 @@ def save_configs(results_path, cfg, factory_conf, eval_factory_conf):
|
||||
|
||||
|
||||
def save_agent_models(results_path, agents):
|
||||
""" Save model parameters after training """
|
||||
for idx, agent in enumerate(agents):
|
||||
agent.pi.save_model_parameters(results_path)
|
||||
agent.vf.save_model_parameters(results_path)
|
||||
|
61
marl_factory_grid/algorithms/tsp/TSP_runner.py
Normal file
61
marl_factory_grid/algorithms/tsp/TSP_runner.py
Normal file
@ -0,0 +1,61 @@
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
from tqdm import trange
|
||||
|
||||
from marl_factory_grid import Factory
|
||||
from marl_factory_grid.algorithms.tsp.contortions import get_dirt_quadrant_tsp_agents, get_two_rooms_tsp_agents
|
||||
|
||||
|
||||
def dirt_quadrant_multi_agent_tsp_eval(emergent_phenomenon):
|
||||
run_tsp_setting("dirt_quadrant", emergent_phenomenon)
|
||||
|
||||
|
||||
def two_rooms_multi_agent_tsp_eval(emergent_phenomenon):
|
||||
run_tsp_setting("two_rooms", emergent_phenomenon)
|
||||
|
||||
|
||||
def run_tsp_setting(config_name, emergent_phenomenon, n_episodes=1):
|
||||
# Render at each step?
|
||||
render = True
|
||||
|
||||
# Path to config File
|
||||
path = Path(f'./marl_factory_grid/environment/configs/tsp/{config_name}.yaml')
|
||||
|
||||
# Create results folder
|
||||
runs = os.listdir("./study_out/")
|
||||
run_numbers = [int(run[7:]) for run in runs if run[:7] == "tsp_run"]
|
||||
next_run_number = max(run_numbers) + 1 if run_numbers else 0
|
||||
results_path = f"./study_out/tsp_run{next_run_number}"
|
||||
os.mkdir(results_path)
|
||||
|
||||
# Env Init
|
||||
factory = Factory(path)
|
||||
|
||||
with open(f"{results_path}/env_config.txt", "w") as txt_file:
|
||||
txt_file.write(str(factory.conf))
|
||||
|
||||
for episode in trange(n_episodes):
|
||||
_ = factory.reset()
|
||||
done = False
|
||||
if render:
|
||||
factory.render()
|
||||
factory._renderer.fps = 5
|
||||
if config_name == "dirt_quadrant":
|
||||
agents = get_dirt_quadrant_tsp_agents(emergent_phenomenon, factory)
|
||||
elif config_name == "two_rooms":
|
||||
agents = get_two_rooms_tsp_agents(emergent_phenomenon, factory)
|
||||
else:
|
||||
print("Config name does not exist. Abort...")
|
||||
break
|
||||
while not done:
|
||||
a = [x.predict() for x in agents]
|
||||
# Have this condition, to terminate as soon as all dirt piles are collected. This ensures that the implementation
|
||||
# of the TSP agent is equivalent to that of the RL agent
|
||||
if 'DirtPiles' in list(factory.state.entities.keys()) and factory.state.entities['DirtPiles'].global_amount == 0.0:
|
||||
break
|
||||
obs_type, _, _, done, info = factory.step(a)
|
||||
if render:
|
||||
factory.render()
|
||||
if done:
|
||||
break
|
@ -1,12 +1,6 @@
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
import numpy as np
|
||||
from tqdm import trange
|
||||
|
||||
from marl_factory_grid.algorithms.tsp.TSP_dirt_agent import TSPDirtAgent
|
||||
from marl_factory_grid.algorithms.tsp.TSP_target_agent import TSPTargetAgent
|
||||
from marl_factory_grid.environment.factory import Factory
|
||||
|
||||
|
||||
def get_dirt_quadrant_tsp_agents(emergent_phenomenon, factory):
|
||||
@ -58,62 +52,4 @@ def get_two_rooms_tsp_agents(emergent_phenomenon, factory):
|
||||
for agent in agents:
|
||||
for u, v, weight in agent._position_graph.edges(data='weight'):
|
||||
agent._position_graph[u][v]['weight'] = edge_costs[f"{u}-{v}"]
|
||||
return agents
|
||||
|
||||
|
||||
def run_tsp_setting(config_name, emergent_phenomenon):
|
||||
# Render at each step?
|
||||
render = True
|
||||
|
||||
# Path to config File
|
||||
path = Path(f'../marl_factory_grid/environment/configs/tsp/{config_name}.yaml')
|
||||
|
||||
# Create results folder
|
||||
runs = os.listdir("../study_out/")
|
||||
run_numbers = [int(run[7:]) for run in runs if run[:7] == "tsp_run"]
|
||||
next_run_number = max(run_numbers) + 1 if run_numbers else 0
|
||||
results_path = f"../study_out/tsp_run{next_run_number}"
|
||||
os.mkdir(results_path)
|
||||
|
||||
# Env Init
|
||||
factory = Factory(path)
|
||||
|
||||
with open(f"{results_path}/env_config.txt", "w") as txt_file:
|
||||
txt_file.write(str(factory.conf))
|
||||
|
||||
for episode in trange(1):
|
||||
_ = factory.reset()
|
||||
done = False
|
||||
if render:
|
||||
factory.render()
|
||||
factory._renderer.fps = 5
|
||||
if config_name == "dirt_quadrant":
|
||||
agents = get_dirt_quadrant_tsp_agents(emergent_phenomenon, factory)
|
||||
elif config_name == "two_rooms":
|
||||
agents = get_two_rooms_tsp_agents(emergent_phenomenon, factory)
|
||||
else:
|
||||
print("Config name does not exist. Abort...")
|
||||
break
|
||||
while not done:
|
||||
a = [x.predict() for x in agents]
|
||||
# Have this condition, to terminate as soon as all dirt piles are collected. This ensures that the implementation
|
||||
# of the TSP agent is equivalent to that of the RL agent
|
||||
if 'DirtPiles' in list(factory.state.entities.keys()) and factory.state.entities['DirtPiles'].global_amount == 0.0:
|
||||
break
|
||||
obs_type, _, _, done, info = factory.step(a)
|
||||
if render:
|
||||
factory.render()
|
||||
if done:
|
||||
break
|
||||
|
||||
|
||||
def dirt_quadrant_multi_agent_tsp(emergent_phenomenon):
|
||||
run_tsp_setting("dirt_quadrant", emergent_phenomenon)
|
||||
|
||||
|
||||
def two_rooms_multi_agent_tsp(emergent_phenomenon):
|
||||
run_tsp_setting("two_rooms", emergent_phenomenon)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
dirt_quadrant_multi_agent_tsp(False)
|
||||
return agents
|
@ -58,7 +58,7 @@ def load_yaml_file(path: Path):
|
||||
|
||||
def add_env_props(cfg):
|
||||
# Path to config File
|
||||
env_path = Path(f'../marl_factory_grid/environment/configs/{cfg["env"]["env_name"]}.yaml')
|
||||
env_path = Path(f'./marl_factory_grid/environment/configs/{cfg["env"]["env_name"]}.yaml')
|
||||
|
||||
# Env Init
|
||||
factory = Factory(env_path)
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: quadrant
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: two_rooms
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: two_rooms
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: quadrant
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: quadrant
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: two_rooms
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: two_rooms
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: two_rooms
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: two_rooms
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: quadrant
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -6,7 +6,7 @@ General:
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: two_rooms
|
||||
# View Radius
|
||||
pomdp_r: 0 # 0 = full observability
|
||||
pomdp_r: 0 # Use custom partial observability setting
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
|
@ -204,15 +204,14 @@ direction_mapping = {
|
||||
}
|
||||
|
||||
|
||||
def plot_reward_development(reward_development, cfg, results_path):
|
||||
def plot_reward_development(reward_development, results_path):
|
||||
smoothed_data = np.convolve(reward_development, np.ones(10) / 10, mode='valid')
|
||||
plt.plot(smoothed_data)
|
||||
plt.ylim([-10, max(smoothed_data) + 20])
|
||||
plt.title('Smoothed Reward Development')
|
||||
plt.xlabel('Episode')
|
||||
plt.ylabel('Reward')
|
||||
if cfg["env"]["save_and_log"]:
|
||||
plt.savefig(f"{results_path}/smoothed_reward_development.png")
|
||||
plt.savefig(f"{results_path}/smoothed_reward_development.png")
|
||||
plt.show()
|
||||
|
||||
|
||||
@ -275,7 +274,7 @@ def plot_reached_flags_per_step():
|
||||
plt.show()
|
||||
|
||||
|
||||
def create_info_maps(env, used_actions, all_valid_observations, dirt_piles_positions, results_path, agents, act_dim,
|
||||
def create_info_maps(env, all_valid_observations, dirt_piles_positions, results_path, agents, act_dim,
|
||||
a2c_instance):
|
||||
# Create value map
|
||||
with open(f"{results_path}/info_maps.txt", "w") as txt_file:
|
||||
@ -323,6 +322,5 @@ def create_info_maps(env, used_actions, all_valid_observations, dirt_piles_posit
|
||||
row += "[" + ', '.join(f"{x:7.4f}" for x in pmap[d, r]) + "]"
|
||||
txt_file.write(row + "]")
|
||||
txt_file.write("\n")
|
||||
txt_file.write(f"Used actions: {used_actions}\n")
|
||||
|
||||
return action_probabilities
|
||||
|
@ -327,7 +327,7 @@ class Renderer:
|
||||
self.screen.blit(prob_text, prob_text_rect)
|
||||
|
||||
pygame.display.flip()
|
||||
self.save_screen("multi_action_graph", result_path)
|
||||
self.save_screen("multi_action_graph", "." + result_path)
|
||||
|
||||
def save_screen(self, filename, result_path):
|
||||
"""
|
||||
|
13
requirements.txt
Normal file
13
requirements.txt
Normal file
@ -0,0 +1,13 @@
|
||||
numpy
|
||||
pygame>=2.0
|
||||
numba>=0.56
|
||||
gymnasium>=0.26
|
||||
seaborn
|
||||
pandas
|
||||
PyYAML
|
||||
networkx
|
||||
torch
|
||||
tqdm
|
||||
packaging
|
||||
pillow
|
||||
scipy
|
39
setup.py
39
setup.py
@ -1,39 +0,0 @@
|
||||
from setuptools import setup, find_packages
|
||||
from pathlib import Path
|
||||
this_directory = Path(__file__).parent
|
||||
long_description = (this_directory / "README.md").read_text()
|
||||
|
||||
|
||||
setup(name='Marl-Factory-Grid',
|
||||
version='0.2.3',
|
||||
description='A framework to research MARL agents in various setings.',
|
||||
author='Steffen Illium',
|
||||
author_email='steffen.illium@ifi.lmu.de',
|
||||
url='https://github.com/illiumst/marl-factory-grid/import',
|
||||
license='MIT',
|
||||
keywords=[
|
||||
'artificial intelligence',
|
||||
'pytorch',
|
||||
'multiagent reinforcement learning',
|
||||
'simulation',
|
||||
'emergence',
|
||||
'gymnasium',
|
||||
'environment',
|
||||
'deepdiff',
|
||||
'natsort',
|
||||
|
||||
],
|
||||
classifiers=[
|
||||
'Development Status :: 4 - Beta',
|
||||
'Intended Audience :: Developers',
|
||||
'Topic :: Scientific/Engineering :: Artificial Intelligence',
|
||||
'License :: OSI Approved :: MIT License',
|
||||
'Programming Language :: Python :: 3.11',
|
||||
],
|
||||
long_description=long_description,
|
||||
long_description_content_type='text/markdown',
|
||||
packages=find_packages(exclude=['examples']),
|
||||
include_package_data=True,
|
||||
install_requires=['numpy', 'pygame>=2.0', 'numba>=0.56', 'gymnasium>=0.26', 'seaborn', 'pandas',
|
||||
'pyyaml', 'networkx', 'torch', 'tqdm']
|
||||
)
|
Reference in New Issue
Block a user