Updated pomdp_r comment + Added some additional comments + Restructured experiment calling + Added Readme and requirements.txt

2025-07-08 02:21:36 +02:00 · 2024-05-27 18:23:11 +02:00
parent 41a1ec0a5b
commit a0852e805a
23 changed files with 327 additions and 369 deletions
--- a/README.md
+++ b/README.md
@ -1,133 +1,32 @@
-# About EDYS
-
-## Tackling emergent dysfunctions (EDYs) in cooperation with Fraunhofer-IKS.
-
-Collaborating with Fraunhofer-IKS, this project is dedicated to investigating Emergent Dysfunctions (EDYs) within
-multi-agent environments. In multi-agent reinforcement learning (MARL), a population of agents learns by interacting
-with each other in a shared environment and adapt their behavior based on the feedback they receive from the environment
-and the actions of other agents.
-
-In this context, emergent behavior describes spontaneous behaviors resulting from interactions among agents and
-environmental stimuli, rather than explicit programming. This promotes natural, adaptable behavior, increases system
-unpredictability for dynamic learning , enables diverse strategies, and encourages collective intelligence for complex
-problem-solving. However, the complex dynamics of the environment also give rise to emerging dysfunctions—unexpected
-issues from agent interactions. This research aims to enhance our understanding of EDYs and their impact on multi-agent
-systems.
-
-### Project Objectives:
-
- Create an environment that provokes emerging dysfunctions.
-
-    - This is achieved by creating a high level of background noise in the domain, where various entities perform
-      diverse tasks,
-      resulting in a deliberately chaotic dynamic.
-    - The goal is to observe and analyze naturally occurring emergent dysfunctions within the complexity generated in
-      this dynamic environment.
-
-
- Observational Framework:
-
-    - The project introduces an environment that is designed to capture dysfunctions as they naturally occur.
-    - The environment allows for continuous monitoring of agent behaviors, actions, and interactions.
-    - Tracking emergent dysfunctions in real-time provides valuable data for analysis and understanding.
-
-
- Compatibility
-    - The Framework allows learning entities from different manufacturers and projects with varying representations
-      of actions and observations to interact seamlessly within the environment.
-
-
- Placeholders
-
-    - One can provide an agent with a placeholder observation that contains no information and offers no meaningful
-      insights.
-    - Later, when the environment expands and introduces additional entities available for observation, these new
-      observations can be provided to the agent.
-    - This allows for processes such as retraining on an already initialized policy and fine-tuning to enhance the
-      agent's performance based on the enriched information.
+# Emergence in Multi-Agent Systems: A Safety Perspective

 ## Setup

-Install this environment using `pip install marl-factory-grid`. For more information refer
-to ['installation'](docs/source/installation.rst).
-Refer to [quickstart](_quickstart) for specific scenarios.
+1. Set up a virtualenv with python 3.9 or higher. You can use pyvenv or conda for this.
+2. Run ```pip install -r requirements.txt``` to get requirements.
+3. In case there is no ```study_out/``` folder in the root directory, create one.

-## Usage
+## Rerunning the Experiments 

-The majority of environment objects, including entities, rules, and assets, can be loaded automatically.
-Simply specify the requirements of your environment in a [
-*yaml*-config file](marl_factory_grid/environment/configs/default_config.yaml).
+The respective experiments from our paper can be reenacted in ```main.py```.
+Just select the function representing the part of our experiments you want to rerun and 
+execute it via the ```__main__``` function.

-If you only plan on using the environment without making any modifications, use ``quickstart_use``.
-This creates a default config-file and another one that lists all possible options of the environment.
-Also, it generates an initial script where an agent is executed in the specified environment.
-For further details on utilizing the environment, refer to ['usage'](docs/source/usage.rst).
+## Further Remarks
+1. We use config files located in the ```marl_factory_grid/environment/configs``` and the 
+```marl_factory_grid/algorithms/rl``` folders to configure the environments and the RL
+algorithm for our experiments, respectively. You don't need to change anything to rerun the 
+experiments, but we provided some additional comments in the configs for an overall better
+understanding of the functionalities.
+2. Instead of collecting coins in the coin-quadrant environment our original implementation 
+works with the premise of cleaning piles of dirt, thus it is named ```dirt_quadrant``` in the code instead. 
+Note that this difference is only visual and does not change the underlying semantics of the environment.
+3. The code for the cost contortion for preventing the emergent behavior of the TSP agents can
+be found in ```marl_factory_grid/algorithms/tsp/contortions.py```.
+4. The functionalities that drive the emergence prevention mechanisms for the RL agents is mainly 
+located in the utility functions ```get_ordered_dirt_piles (line 91)``` (for solving the emergence in the 
+coin-quadrant environment) and ```distribute_indices (line 165)``` (mechanism for two_doors), that are part of
+```marl_factory_grid/algorithms/rl/utils.py```

-Existing modules include a variety of functionalities within the environment:

- [Agents](marl_factory_grid/algorithms) implement either static strategies or learning algorithms based on the specific
-  configuration.
- Their action set includes opening [door entities](marl_factory_grid/modules/doors/entitites.py), cleaning
-  [dirt](marl_factory_grid/modules/clean_up/entitites.py), picking
-  up [items](marl_factory_grid/modules/items/entitites.py) and
-  delivering them to designated drop-off locations.
- Agents are equipped with a [battery](marl_factory_grid/modules/batteries/entitites.py) that gradually depletes over
-  time if not charged at a chargepod.
- The [maintainer](marl_factory_grid/modules/maintenance/entities.py) aims to
-  repair [machines](marl_factory_grid/modules/machines/entitites.py) that lose health over time.
-
-## Customization
-
-If you plan on modifying the environment by for example adding entities or rules, use ``quickstart_modify``.
-This creates a template module and a script that runs an agent, incorporating the generated module.
-More information on how to modify the levels, entities, groups, rules and assets
-goto [modifications](docs/source/modifications.rst).
-
-### Levels
-
-Varying levels are created by defining Walls, Floor or Doors in *.txt*-files (see [levels](marl_factory_grid/levels) for
-examples).
-Define which *level* to use in your *configfile* as:
-
-```yaml
-General:
-  level_name: rooms  # 'double', 'large', 'simple', ...
-```
-
-... or create your own , maybe with the help of [asciiflow.com](https://asciiflow.com/#/).
-Make sure to use `#` as [Walls](marl_factory_grid/environment/entity/wall.py), `-` as free (walkable) floor, `D`
-for [Doors](./modules/doors/entities.py).
-Other Entites (define you own) may bring their own `Symbols`
-
-### Entites
-
-Entites are [Objects](marl_factory_grid/environment/entity/object.py) that can additionally be assigned a position.
-Abstract Entities are provided.
-
-### Groups
-
-[Groups](marl_factory_grid/environment/groups/objects.py) are entity Sets that provide administrative access to all
-group members.
-All [Entites](marl_factory_grid/environment/entity/global_entities.py) are available at runtime as EnvState property.
-
-### Rules
-
-[Rules](marl_factory_grid/environment/entity/object.py) define how the environment behaves on microscale.
-Each of the hookes (`on_init`, `pre_step`, `on_step`, '`post_step`', `on_done`)
-provide env-access to implement custom logic, calculate rewards, or gather information.
-
-![Hooks](../../images/Hooks_FIKS.png)
-
-[Results](marl_factory_grid/environment/entity/object.py) provide a way to return `rule` evaluations such as rewards and
-state reports back to the environment.
-
-### Assets
-
-Make sure to bring your own assets for each Entity living in the Gridworld as the `Renderer` relies on it.
-PNG-files (transparent background) of square aspect-ratio should do the job, in general.
-
-<img src="/marl_factory_grid/environment/assets/wall.png"  width="5%"> 
-<!--suppress HtmlUnknownAttribute -->
-<html &nbsp&nbsp&nbsp&nbsp html> 
-<img src="/marl_factory_grid/environment/assets/agent/agent.png"  width="5%">

--- a/main.py
+++ b/main.py
@ -0,0 +1,84 @@
+from marl_factory_grid.algorithms.rl.RL_runner import rerun_dirt_quadrant_agent1_training, \
+    rerun_two_rooms_agent1_training, rerun_two_rooms_agent2_training, dirt_quadrant_multi_agent_rl_eval, \
+    two_rooms_multi_agent_rl_eval
+from marl_factory_grid.algorithms.tsp.TSP_runner import dirt_quadrant_multi_agent_tsp_eval, \
+    two_rooms_multi_agent_tsp_eval
+
+
+###### Coin-quadrant environment ######
+def coin_quadrant_single_agent_training():
+    """ Rerun training of RL-agent in coins_quadrant (dirt_quadrant) environment.
+        The trained model and additional training metrics are saved in the study_out folder. """
+    rerun_dirt_quadrant_agent1_training()
+
+
+def coin_quadrant_RL_multi_agent_eval_emergent():
+    """ Rerun multi-agent evaluation of RL-agents in coins_quadrant (dirt_quadrant)
+        environment, with occurring emergent phenomenon. Evaluation takes trained models
+        from study_out/run0 for both agents."""
+    dirt_quadrant_multi_agent_rl_eval(emergent_phenomenon=True)
+
+
+def coin_quadrant_RL_multi_agent_eval_prevented():
+    """ Rerun multi-agent evaluation of RL-agents in coins_quadrant (dirt_quadrant)
+        environment, with emergence prevention mechanism. Evaluation takes trained models
+        from study_out/run0 for both agents."""
+    dirt_quadrant_multi_agent_rl_eval(emergent_phenomenon=False)
+
+
+def coin_quadrant_TSP_multi_agent_eval_emergent():
+    """ Rerun multi-agent evaluation of TSP-agents in coins_quadrant (dirt_quadrant)
+        environment, with occurring emergent phenomenon. """
+    dirt_quadrant_multi_agent_tsp_eval(emergent_phenomenon=True)
+
+
+def coin_quadrant_TSP_multi_agent_eval_prevented():
+    """ Rerun multi-agent evaluation of TSP-agents in coins_quadrant (dirt_quadrant)
+        environment, with emergence prevention mechanism. """
+    dirt_quadrant_multi_agent_tsp_eval(emergent_phenomenon=False)
+
+
+###### Two-rooms environment ######
+
+def two_rooms_agent1_training():
+    """ Rerun training of left RL-agent in two_rooms environment.
+        The trained model and additional training metrics are saved in the study_out folder. """
+    rerun_two_rooms_agent1_training()
+
+
+def two_rooms_agent2_training():
+    """ Rerun training of right RL-agent in two_rooms environment.
+        The trained model and additional training metrics are saved in the study_out folder. """
+    rerun_two_rooms_agent2_training()
+
+
+def two_rooms_RL_multi_agent_eval_emergent():
+    """ Rerun multi-agent evaluation of RL-agents in two_rooms environment, with
+        occurring emergent phenomenon. Evaluation takes trained models
+        from study_out/run1 for agent1 and study_out/run2 for agent2. """
+    two_rooms_multi_agent_rl_eval(emergent_phenomenon=True)
+
+
+def two_rooms_RL_multi_agent_eval_prevented():
+    """ Rerun multi-agent evaluation of RL-agents in two_rooms environment, with
+        emergence prevention mechanism. Evaluation takes trained models
+        from study_out/run1 for agent1 and study_out/run2 for agent2. """
+    two_rooms_multi_agent_rl_eval(emergent_phenomenon=False)
+
+
+def two_rooms_TSP_multi_agent_eval_emergent():
+    """ Rerun multi-agent evaluation of TSP-agents in two_rooms environment, with
+        occurring emergent phenomenon. """
+    two_rooms_multi_agent_tsp_eval(emergent_phenomenon=True)
+
+
+def two_rooms_TSP_multi_agent_eval_prevented():
+    """ Rerun multi-agent evaluation of TSP-agents in two_rooms environment, with
+        emergence prevention mechanism. """
+    two_rooms_multi_agent_tsp_eval(emergent_phenomenon=False)
+
+
+if __name__ == '__main__':
+    # Select any of the above functions to rerun the respective part
+    #  from our evaluation section of the paper
+    coin_quadrant_RL_multi_agent_eval_prevented()
--- a/marl_factory_grid/algorithms/rl/RL_runner.py
+++ b/marl_factory_grid/algorithms/rl/RL_runner.py
@ -3,9 +3,10 @@ from marl_factory_grid.algorithms.rl.a2c_dirt import A2C
 from marl_factory_grid.algorithms.utils import load_yaml_file


-def dirt_quadrant_agent1_training():
-    train_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/dirt_quadrant_train_config.yaml')
-    eval_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/dirt_quadrant_eval_config.yaml')
+####### Training routines ######
+def rerun_dirt_quadrant_agent1_training():
+    train_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/single_agent_configs/dirt_quadrant_train_config.yaml')
+    eval_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/single_agent_configs/dirt_quadrant_eval_config.yaml')
    train_cfg = load_yaml_file(train_cfg_path)
    eval_cfg = load_yaml_file(eval_cfg_path)

@ -17,8 +18,8 @@ def dirt_quadrant_agent1_training():


 def two_rooms_training(max_steps, agent_name):
-    train_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/two_rooms_train_config.yaml')
-    eval_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/two_rooms_eval_config.yaml')
+    train_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/single_agent_configs/two_rooms_train_config.yaml')
+    eval_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/single_agent_configs/two_rooms_eval_config.yaml')
    train_cfg = load_yaml_file(train_cfg_path)
    eval_cfg = load_yaml_file(eval_cfg_path)

@ -32,14 +33,15 @@ def two_rooms_training(max_steps, agent_name):
    agent.eval_loop(n_episodes=1)


-def two_rooms_agent1_training():
+def rerun_two_rooms_agent1_training():
    two_rooms_training(max_steps=190000, agent_name="agent1")


-def two_rooms_agent2_training():
+def rerun_two_rooms_agent2_training():
    two_rooms_training(max_steps=260000, agent_name="agent2")


+####### Eval routines ########
 def single_agent_eval(config_name, run_folder_name):
    eval_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/single_agent_configs/{config_name}_eval_config.yaml')
    train_cfg = eval_cfg = load_yaml_file(eval_cfg_path)
@ -52,7 +54,7 @@ def single_agent_eval(config_name, run_folder_name):


 def multi_agent_eval(config_name, runs, emergent_phenomenon=False):
-    eval_cfg_path = Path(f'../marl_factory_grid/algorithms/rl/multi_agent_configs/{config_name}' +
+    eval_cfg_path = Path(f'./marl_factory_grid/algorithms/rl/multi_agent_configs/{config_name}' +
                         f'_eval_config{"_emergent" if emergent_phenomenon else ""}.yaml')
    eval_cfg = load_yaml_file(eval_cfg_path)

@ -63,13 +65,9 @@ def multi_agent_eval(config_name, runs, emergent_phenomenon=False):
    agent.eval_loop(1)


-def dirt_quadrant_multi_agent_ctde_eval(emergent_phenomenon):
+def dirt_quadrant_multi_agent_rl_eval(emergent_phenomenon):
    multi_agent_eval("dirt_quadrant", ["run0", "run0"], emergent_phenomenon)


-def two_rooms_multi_agent_eval(emergent_phenomenon):
-    multi_agent_eval("two_rooms", ["run1", "run2"], emergent_phenomenon)
-
-
-if __name__ == '__main__':
-    dirt_quadrant_agent1_training()
+def two_rooms_multi_agent_rl_eval(emergent_phenomenon):
+    multi_agent_eval("two_rooms", ["run1", "run2"], emergent_phenomenon)
--- a/marl_factory_grid/algorithms/rl/a2c_dirt.py
+++ b/marl_factory_grid/algorithms/rl/a2c_dirt.py
@ -2,13 +2,14 @@ import os
 import torch
 from typing import Union, List
 import numpy as np
+from tqdm import tqdm

 from marl_factory_grid.algorithms.rl.base_a2c import PolicyGradient, cumulate_discount
 from marl_factory_grid.algorithms.rl.constants import Names
-from marl_factory_grid.algorithms.rl.utils import transform_observations, _as_torch, door_is_close, \
+from marl_factory_grid.algorithms.rl.utils import transform_observations, _as_torch, is_door_close, \
    get_dirt_piles_positions, update_target_pile, update_ordered_dirt_piles, get_all_cleaned_dirt_piles, \
-    distribute_indices, set_agent_spawnpoint, get_ordered_dirt_piles, handle_finished_episode, save_configs, \
-    save_agent_models, get_all_observations
+    distribute_indices, set_agents_spawnpoints, get_ordered_dirt_piles, handle_finished_episode, save_configs, \
+    save_agent_models, get_all_observations, get_agents_positions
 from marl_factory_grid.algorithms.utils import add_env_props
 from marl_factory_grid.utils.plotting.plot_single_runs import plot_action_maps, plot_reward_development, \
    create_info_maps
@ -28,93 +29,88 @@ class A2C:
        self.n_agents = train_cfg[nms.ENV][nms.N_AGENTS]
        self.setup()
        self.reward_development = []
-        self.action_probabilities = {agent_idx:[] for agent_idx in range(self.n_agents)}
+        self.action_probabilities = {agent_idx: [] for agent_idx in range(self.n_agents)}

    def setup(self):
-        dirt_piles_positions = [self.factory.state.entities[nms.DIRT_PILES][pile_idx].pos for pile_idx in
-                                range(len(self.factory.state.entities[nms.DIRT_PILES]))]
-        self.obs_dim = 2 + 2*len(dirt_piles_positions) if self.cfg[nms.ALGORITHM][nms.PILE_OBSERVABILITY] == nms.ALL else 4
+        """ Initialize agents and create entry for run results according to configuration """
+        self.obs_dim = 2 + 2 * len(get_dirt_piles_positions(self.factory)) if self.cfg[nms.ALGORITHM][
+                                                                                  nms.PILE_OBSERVABILITY] == nms.ALL else 4
        self.act_dim = 4  # The 4 movement directions
-        self.agents = [PolicyGradient(self.factory, agent_id=i, obs_dim=self.obs_dim, act_dim=self.act_dim) for i in range(self.n_agents)]
+        self.agents = [PolicyGradient(self.factory, agent_id=i, obs_dim=self.obs_dim, act_dim=self.act_dim) for i in
+                       range(self.n_agents)]
+
        if self.cfg[nms.ENV][nms.SAVE_AND_LOG]:
            # Create results folder
-            runs = os.listdir("../study_out/")
+            runs = os.listdir("./study_out/")
            run_numbers = [int(run[3:]) for run in runs if run[:3] == "run"]
-            next_run_number = max(run_numbers)+1 if run_numbers else 0
-            self.results_path = f"../study_out/run{next_run_number}"
+            next_run_number = max(run_numbers) + 1 if run_numbers else 0
+            self.results_path = f"./study_out/run{next_run_number}"
            os.mkdir(self.results_path)
            # Save settings in results folder
            save_configs(self.results_path, self.cfg, self.factory.conf, self.eval_factory.conf)

    def set_cfg(self, eval=False):
+        """ Set the mode of the current configuration """
        if eval:
            self.cfg = self.eval_cfg
        else:
            self.cfg = self.train_cfg

    def load_agents(self, runs_list):
+        """ Initialize networks with parameters of already trained agents """
        for idx, run in enumerate(runs_list):
-            run_path = f"../study_out/{run}"
+            run_path = f"./study_out/{run}"
            self.agents[idx].pi.load_model_parameters(f"{run_path}/PolicyNet_model_parameters.pth")
            self.agents[idx].vf.load_model_parameters(f"{run_path}/ValueNet_model_parameters.pth")

    @torch.no_grad()
    def train_loop(self):
+        """ Function for training agents """
        env = self.factory
        n_steps, max_steps = [self.cfg[nms.ALGORITHM][k] for k in [nms.N_STEPS, nms.MAX_STEPS]]
        global_steps, episode = 0, 0
        indices = distribute_indices(env, self.cfg, self.n_agents)
        dirt_piles_positions = get_dirt_piles_positions(env)
-        used_actions = {i:0 for i in range(len(env.state.entities[nms.AGENT][0]._actions))} # Assume both agents have the same actions
-        target_pile = [partition[0] for partition in indices]  # pointer that points to the target pile for each agent. (point to same pile, point to different piles)
-        cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)] # Have own dictionary for each agent
+        target_pile = [partition[0] for partition in
+                       indices]  # list of pointers that point to the current target pile for each agent
+        cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]

+        pbar = tqdm(total=max_steps)
        while global_steps < max_steps:
-            print(global_steps)
-            obs = env.reset()
+            _ = env.reset()
            if self.cfg[nms.ENV][nms.TRAIN_RENDER]:
                env.render()
-            set_agent_spawnpoint(env, self.n_agents)
+            set_agents_spawnpoints(env, self.n_agents)
            ordered_dirt_piles = get_ordered_dirt_piles(env, cleaned_dirt_piles, self.cfg, self.n_agents)
            # Reset current target pile at episode begin if all piles have to be cleaned in one episode
            if self.cfg[nms.ALGORITHM][nms.PILE_ALL_DONE] == nms.ALL:
                target_pile = [partition[0] for partition in indices]
                cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]

+            # Supply each agent with its local observation
            obs = transform_observations(env, ordered_dirt_piles, target_pile, self.cfg, self.n_agents)
-            done, rew_log       = [False] * self.n_agents, 0
-
-            print("Agents spawnpoints:", [env.state.moving_entites[agent_idx].pos for agent_idx in range(self.n_agents)])
-            print("Agents target piles:", target_pile)
-            print("Agents initial observation:", obs)
-            print("Agents cleaned dirt piles:", cleaned_dirt_piles)
+            done, rew_log = [False] * self.n_agents, 0

            while not all(done):
-                # 0="North", 1="East", 2="South", 3="West", 4="Clean", 5="Noop"
                action = self.use_door_or_move(env, obs, cleaned_dirt_piles) \
                    if nms.DOORS in env.state.entities.keys() else self.get_actions(obs)
-                used_actions[int(action[0])] += 1
                _, next_obs, reward, done, info = env.step(action)
-                if done:
-                    print("DoneAtMaxStepsReached:", len(self.agents[0]._episode))
                next_obs = transform_observations(env, ordered_dirt_piles, target_pile, self.cfg, self.n_agents)

-                reward, done = self.handle_dirt(env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices, reward, done)
+                # Handle case where agent is on field with dirt
+                reward, done = self.handle_dirt(env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices,
+                                                reward, done)

-                if n_steps != 0 and (global_steps + 1) % n_steps == 0:
-                    print("max_steps reached")
-                    done = True
+                if n_steps != 0 and (global_steps + 1) % n_steps == 0: done = True

                done = [done] * self.n_agents if isinstance(done, bool) else done
                for ag_i, agent in enumerate(self.agents):
-                    # For forced actions like door opening, we have to call the step function with this action, but
-                    # since we are not allowed to exceed the dimensions range, we can't log the corresponding step info.
                    if action[ag_i] in range(self.act_dim):
                        # Add agent results into respective rollout buffers
                        agent._episode[-1] = (next_obs[ag_i], action[ag_i], reward[ag_i], agent._episode[-1][-1])

-                if self.cfg[nms.ENV][nms.TRAIN_RENDER]:
-                    env.render()
+                # Visualize state update
+                if self.cfg[nms.ENV][nms.TRAIN_RENDER]: env.render()

                obs = next_obs

@ -123,97 +119,93 @@ class A2C:
                global_steps += 1
                rew_log += sum(reward)

-                if global_steps >= max_steps:
-                    break
+                if global_steps >= max_steps: break

-            print(f'reward at episode: {episode} = {rew_log}')
            self.reward_development.append(rew_log)
            episode += 1
+            pbar.update(global_steps - pbar.n)

-        plot_reward_development(self.reward_development, self.cfg, self.results_path)
+        pbar.close()
        if self.cfg[nms.ENV][nms.SAVE_AND_LOG]:
-            create_info_maps(env, used_actions, get_all_observations(env, self.cfg, self.n_agents),
+            plot_reward_development(self.reward_development, self.results_path)
+            create_info_maps(env, get_all_observations(env, self.cfg, self.n_agents),
                             get_dirt_piles_positions(env), self.results_path, self.agents, self.act_dim, self)
            save_agent_models(self.results_path, self.agents)
            plot_action_maps(env, [self], self.results_path)

    @torch.inference_mode(True)
-    def eval_loop(self, n_episodes, render=False):
+    def eval_loop(self, n_episodes):
+        """ Function for performing inference """
        env = self.eval_factory
        self.set_cfg(eval=True)
        episode, results = 0, []
        dirt_piles_positions = get_dirt_piles_positions(env)
        indices = distribute_indices(env, self.cfg, self.n_agents)
-        target_pile = [partition[0] for partition in indices]  # pointer that points to the target pile for each agent. (point to same pile/ point to different piles)
+        target_pile = [partition[0] for partition in
+                       indices]  # list of pointers that point to the current target pile for each agent
        if self.cfg[nms.ALGORITHM][nms.PILE_ALL_DONE] == nms.DISTRIBUTED:
-            cleaned_dirt_piles = [{dirt_piles_positions[idx]: False for idx in indices[i]} for i in range(self.n_agents)]
-        else:
-            cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]
+            cleaned_dirt_piles = [{dirt_piles_positions[idx]: False for idx in indices[i]} for i in
+                                  range(self.n_agents)]
+        else: cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]

        while episode < n_episodes:
-            obs = env.reset()
-            set_agent_spawnpoint(env, self.n_agents)
+            _ = env.reset()
+            set_agents_spawnpoints(env, self.n_agents)
            if self.cfg[nms.ENV][nms.EVAL_RENDER]:
+                # Don't render auxiliary piles
                if self.cfg[nms.ALGORITHM][nms.AUXILIARY_PILES]:
-                    # Don't render auxiliary piles
-                    auxiliary_piles = [pile for idx, pile in enumerate(env.state.entities[nms.DIRT_PILES]) if idx % 2 == 0]
+                    auxiliary_piles = [pile for idx, pile in enumerate(env.state.entities[nms.DIRT_PILES]) if
+                                       idx % 2 == 0]
                    for pile in auxiliary_piles:
                        pile.set_new_amount(0)
                env.render()
-                env._renderer.fps = 5 # Slow down agent movement
+                env._renderer.fps = 5  # Slow down agent movement

            # Reset current target pile at episode begin if all piles have to be cleaned in one episode
            if self.cfg[nms.ALGORITHM][nms.PILE_ALL_DONE] in [nms.ALL, nms.DISTRIBUTED, nms.SHARED]:
                target_pile = [partition[0] for partition in indices]
                if self.cfg[nms.ALGORITHM][nms.PILE_ALL_DONE] == nms.DISTRIBUTED:
-                    cleaned_dirt_piles = [{dirt_piles_positions[idx]: False for idx in indices[i]} for i in range(self.n_agents)]
-                else:
-                    cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]
+                    cleaned_dirt_piles = [{dirt_piles_positions[idx]: False for idx in indices[i]} for i in
+                                          range(self.n_agents)]
+                else: cleaned_dirt_piles = [{pos: False for pos in dirt_piles_positions} for _ in range(self.n_agents)]

            ordered_dirt_piles = get_ordered_dirt_piles(env, cleaned_dirt_piles, self.cfg, self.n_agents)

+            # Supply each agent with its local observation
            obs = transform_observations(env, ordered_dirt_piles, target_pile, self.cfg, self.n_agents)
            done, rew_log, eps_rew = [False] * self.n_agents, 0, torch.zeros(self.n_agents)

            while not all(done):
                action = self.use_door_or_move(env, obs, cleaned_dirt_piles, det=True) \
-                    if nms.DOORS in env.state.entities.keys() else self.execute_policy(obs, env, cleaned_dirt_piles) # zero exploration
-                _, next_obs, reward, done, info = env.step(action) # Note that this call seems to flip the lists in indices
-                if done:
-                    print("DoneAtMaxStepsReached:", len(self.agents[0]._episode))
+                    if nms.DOORS in env.state.entities.keys() else self.execute_policy(obs, env,
+                                                                                       cleaned_dirt_piles)  # zero exploration
+                _, next_obs, reward, done, info = env.step(action)

-                # Add small negative reward if agent has moved away from the target_pile
-                # reward = self.reward_distance(env, obs, target_pile, reward)
+                # Handle case where agent is on field with dirt
+                reward, done = self.handle_dirt(env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices,
+                                                reward, done)

-                # Check and handle if agent is on field with dirt
-                reward, done = self.handle_dirt(env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices, reward, done)
-
-                # Get transformed next_obs that might have been updated because of self.handle_dirt.
-                # For eval, where pile_all_done is "all", it's mandatory that the potential change of the target pile
-                # in the observation, caused by self.handle_dirt, is already considered when the next action is calculated.
+                # Get transformed next_obs that might have been updated because of handle_dirt
                next_obs = transform_observations(env, ordered_dirt_piles, target_pile, self.cfg, self.n_agents)

                done = [done] * self.n_agents if isinstance(done, bool) else done

-                if self.cfg[nms.ENV][nms.EVAL_RENDER]:
-                    env.render()
+                if self.cfg[nms.ENV][nms.EVAL_RENDER]: env.render()

                obs = next_obs

            episode += 1

-
-
    ########## Helper functions ########

    def get_actions(self, observations) -> ListOrTensor:
-        # Given an observation, get actions for both agents
+        """ Given local observations, get actions for both agents """
        actions = [agent.step(_as_torch(observations[ag_i]).view(-1).to(torch.float32)) for ag_i, agent in
                   enumerate(self.agents)]
        return actions

    def execute_policy(self, observations, env, cleaned_dirt_piles) -> ListOrTensor:
-        # Use deterministic policy for inference
+        """ Execute agent policies deterministically for inference """
        actions = [agent.policy(_as_torch(observations[ag_i]).view(-1).to(torch.float32)) for ag_i, agent in
                   enumerate(self.agents)]
        for agent_idx in range(self.n_agents):
@ -224,10 +216,11 @@ class A2C:
        return actions

    def use_door_or_move(self, env, obs, cleaned_dirt_piles, det=False):
+        """ Function that handles automatic actions like door opening and forced Noop"""
        action = []
        for agent_idx, agent in enumerate(self.agents):
            agent_obs = _as_torch((obs)[agent_idx]).view(-1).to(torch.float32)
-            # If agent already reached its target
+            # Use Noop operation if agent already reached its target. (Only relevant for two-rooms setting)
            if all(cleaned_dirt_piles[agent_idx].values()):
                action.append(next(action_i for action_i, a in enumerate(env.state[nms.AGENT][agent_idx].actions) if
                                   a.name == nms.NOOP))
@ -235,37 +228,33 @@ class A2C:
                    # Include agent experience entry manually
                    agent._episode.append((None, None, None, agent.vf(agent_obs)))
            else:
-                if door := door_is_close(env, agent_idx):
+                if door := is_door_close(env, agent_idx):
                    if door.is_closed:
                        action.append(next(
                            action_i for action_i, a in enumerate(env.state[nms.AGENT][agent_idx].actions) if
                            a.name == nms.USE_DOOR))
                        # Don't include action in agent experience
                    else:
-                        if det:
-                            action.append(int(agent.pi(agent_obs, det=True)[0]))
-                        else:
-                            action.append(int(agent.step(agent_obs)))
+                        if det: action.append(int(agent.pi(agent_obs, det=True)[0]))
+                        else: action.append(int(agent.step(agent_obs)))
                else:
-                    if det:
-                        action.append(int(agent.pi(agent_obs, det=True)[0]))
-                    else:
-                        action.append(int(agent.step(agent_obs)))
+                    if det: action.append(int(agent.pi(agent_obs, det=True)[0]))
+                    else: action.append(int(agent.step(agent_obs)))
        return action

    def handle_dirt(self, env, cleaned_dirt_piles, ordered_dirt_piles, target_pile, indices, reward, done):
-        # Check if agent moved on field with dirt. If that is the case collect dirt automatically
-        agent_positions = [env.state.moving_entites[agent_idx].pos for agent_idx in range(self.n_agents)]
+        """ Check if agent moved on field with dirt. If that is the case collect dirt automatically """
+        agents_positions = get_agents_positions(env, self.n_agents)
        dirt_piles_positions = get_dirt_piles_positions(env)
-        if any([True for pos in agent_positions if pos in dirt_piles_positions]):
+        if any([True for pos in agents_positions if pos in dirt_piles_positions]):
            # Only simulate collecting the dirt
-            for idx, pos in enumerate(agent_positions):
+            for idx, pos in enumerate(agents_positions):
                if pos in cleaned_dirt_piles[idx].keys() and not cleaned_dirt_piles[idx][pos]:

                    # If dirt piles should be cleaned in a specific order
                    if ordered_dirt_piles[idx]:
                        if pos == ordered_dirt_piles[idx][target_pile[idx]]:
-                            reward[idx] += 50  # 1
+                            reward[idx] += 50
                            cleaned_dirt_piles[idx][pos] = True
                            # Set pointer to next dirt pile
                            update_target_pile(env, idx, target_pile, indices, self.cfg)
@ -278,7 +267,7 @@ class A2C:
                                    for pos in dirt_piles_positions:
                                        cleaned_dirt_piles[idx][pos] = False
                    else:
-                        reward[idx] += 50  # 1
+                        reward[idx] += 50
                        cleaned_dirt_piles[idx][pos] = True

                    # Indicate that renderer can hide dirt pile
@ -294,4 +283,3 @@ class A2C:
                    done = True

        return reward, done
-
--- a/marl_factory_grid/algorithms/rl/utils.py
+++ b/marl_factory_grid/algorithms/rl/utils.py
@ -10,6 +10,7 @@ from marl_factory_grid.algorithms.rl.constants import Names
 nms = Names

 def _as_torch(x):
+    """ Helper function to convert different list types to a torch tensor """
    if isinstance(x, np.ndarray):
        return torch.from_numpy(x)
    elif isinstance(x, List):
@ -20,15 +21,16 @@ def _as_torch(x):


 def transform_observations(env, ordered_dirt_piles, target_pile, cfg, n_agents):
-    """ Requires that agent has observations -DirtPiles and -Self """
-    agent_positions = [env.state.moving_entites[agent_idx].pos for agent_idx in range(n_agents)]
+    """ Function that extracts local observations from global state
+    Requires that agents have observations -DirtPiles and -Self (cf. environment configs) """
+    agents_positions = get_agents_positions(env, n_agents)
    pile_observability_is_all = cfg[nms.ALGORITHM][nms.PILE_OBSERVABILITY] == nms.ALL
    if pile_observability_is_all:
-        trans_obs = [torch.zeros(2+2*len(ordered_dirt_piles[0])) for _ in range(len(agent_positions))]
+        trans_obs = [torch.zeros(2+2*len(ordered_dirt_piles[0])) for _ in range(len(agents_positions))]
    else:
        # Only show current target pile
-        trans_obs = [torch.zeros(4) for _ in range(len(agent_positions))]
-    for i, pos in enumerate(agent_positions):
+        trans_obs = [torch.zeros(4) for _ in range(len(agents_positions))]
+    for i, pos in enumerate(agents_positions):
        agent_x, agent_y = pos[0], pos[1]
        trans_obs[i][0] = agent_x
        trans_obs[i][1] = agent_y
@ -45,6 +47,7 @@ def transform_observations(env, ordered_dirt_piles, target_pile, cfg, n_agents):


 def get_all_observations(env, cfg, n_agents):
+    """ Helper function that returns all possible agent observations """
    dirt_piles_positions = [env.state.entities[nms.DIRT_PILES][pile_idx].pos for pile_idx in
                            range(len(env.state.entities[nms.DIRT_PILES]))]
    if cfg[nms.ALGORITHM][nms.PILE_OBSERVABILITY] == nms.ALL:
@ -76,41 +79,48 @@ def get_all_observations(env, cfg, n_agents):


 def get_dirt_piles_positions(env):
+    """ Get positions of dirt piles on the map """
    return [env.state.entities[nms.DIRT_PILES][pile_idx].pos for pile_idx in range(len(env.state.entities[nms.DIRT_PILES]))]


+def get_agents_positions(env, n_agents):
+    """ Get positions of agents on the map """
+    return [env.state.moving_entites[agent_idx].pos for agent_idx in range(n_agents)]
+
+
 def get_ordered_dirt_piles(env, cleaned_dirt_piles, cfg, n_agents):
-    """ Each agent can have its individual pile order """
+    """ This function determines in which order the agents should clean the dirt piles
+    Each agent can have its individual pile order """
    ordered_dirt_piles = [[] for _ in range(n_agents)]
-    dirt_pile_positions = get_dirt_piles_positions(env)
-    agent_positions = [env.state.moving_entites[agent_idx].pos for agent_idx in range(n_agents)]
+    dirt_piles_positions = get_dirt_piles_positions(env)
+    agents_positions = get_agents_positions(env, n_agents)
    for agent_idx in range(n_agents):
        if cfg[nms.ALGORITHM][nms.PILE_ORDER] in [nms.FIXED, nms.AGENTS]:
-            ordered_dirt_piles[agent_idx] = dirt_pile_positions
+            ordered_dirt_piles[agent_idx] = dirt_piles_positions
        elif cfg[nms.ALGORITHM][nms.PILE_ORDER] in [nms.SMART, nms.DYNAMIC]:
            # Calculate distances for remaining unvisited dirt piles
            remaining_target_piles = [pos for pos, value in cleaned_dirt_piles[agent_idx].items() if not value]
            pile_distances = {pos:0 for pos in remaining_target_piles}
-            agent_pos = agent_positions[agent_idx]
+            agent_pos = agents_positions[agent_idx]
            for pos in remaining_target_piles:
                pile_distances[pos] = np.abs(agent_pos[0] - pos[0]) + np.abs(agent_pos[1] - pos[1])

            if cfg[nms.ALGORITHM][nms.PILE_ORDER] == nms.SMART:
-                # Check if there is an agent in line with any of the remaining dirt piles
+                # Check if there is an agent on the direct path to any of the remaining dirt piles
                for pile_pos in remaining_target_piles:
-                    for other_pos in agent_positions:
+                    for other_pos in agents_positions:
                        if other_pos != agent_pos:
                            if agent_pos[0] == other_pos[0] == pile_pos[0] or agent_pos[1] == other_pos[1] == pile_pos[1]:
-                                # Get the line between the agent and the goal
+                                # Get the line between the agent and the target
                                path = bresenham(agent_pos[0], agent_pos[1], pile_pos[0], pile_pos[1])

-                                # Check if the entity lies on the path between the agent and the goal
+                                # Check if the entity lies on the path between the agent and the target
                                if other_pos in path:
                                    pile_distances[pile_pos] += np.abs(agent_pos[0] - other_pos[0]) + np.abs(agent_pos[1] - other_pos[1])

            sorted_pile_distances = dict(sorted(pile_distances.items(), key=lambda item: item[1]))
            # Insert already visited dirt piles
-            ordered_dirt_piles[agent_idx] = [pos for pos in dirt_pile_positions if pos not in remaining_target_piles]
+            ordered_dirt_piles[agent_idx] = [pos for pos in dirt_piles_positions if pos not in remaining_target_piles]
            # Fill up with sorted positions
            for pos in sorted_pile_distances.keys():
                ordered_dirt_piles[agent_idx].append(pos)
@ -145,6 +155,7 @@ def bresenham(x0, y0, x1, y1):


 def update_ordered_dirt_piles(agent_idx, cleaned_dirt_piles, ordered_dirt_piles, env, cfg, n_agents):
+    """ Update the order of the remaining dirt piles """
    # Only update ordered_dirt_pile for agent that reached its target pile
    updated_ordered_dirt_piles = get_ordered_dirt_piles(env, cleaned_dirt_piles, cfg, n_agents)
    for i in range(len(ordered_dirt_piles[agent_idx])):
@ -152,8 +163,10 @@ def update_ordered_dirt_piles(agent_idx, cleaned_dirt_piles, ordered_dirt_piles,


 def distribute_indices(env, cfg, n_agents):
+    """ Distribute dirt piles evenly among the agents """
    indices = []
    n_dirt_piles = len(get_dirt_piles_positions(env))
+    agents_positions = get_agents_positions(env, n_agents)
    if n_dirt_piles == 1 or cfg[nms.ALGORITHM][nms.PILE_ORDER] in [nms.FIXED, nms.DYNAMIC, nms.SMART]:
        indices = [[0] for _ in range(n_agents)]
    else:
@ -171,12 +184,11 @@ def distribute_indices(env, cfg, n_agents):
        # -> Starting with index 0 even piles are auxiliary piles, odd piles are primary piles
        if cfg[nms.ALGORITHM][nms.AUXILIARY_PILES] and nms.DOORS in env.state.entities.keys():
            door_positions = [door.pos for door in env.state.entities[nms.DOORS]]
-            agent_positions = [env.state.moving_entites[agent_idx].pos for agent_idx in range(n_agents)]
            distances = {door_pos:[] for door_pos in door_positions}

            # Calculate distance of every agent to every door
            for door_pos in door_positions:
-                for agent_pos in agent_positions:
+                for agent_pos in agents_positions:
                    distances[door_pos].append(np.abs(door_pos[0] - agent_pos[0]) + np.abs(door_pos[1] - agent_pos[1]))

            def duplicate_indices(lst, item):
@ -213,6 +225,7 @@ def distribute_indices(env, cfg, n_agents):


 def update_target_pile(env, agent_idx, target_pile, indices, cfg):
+    """ Get the next target pile for a given agent """
    if cfg[nms.ALGORITHM][nms.PILE_ORDER] in [nms.FIXED, nms.DYNAMIC, nms.SMART]:
        if target_pile[agent_idx] + 1 < len(get_dirt_piles_positions(env)):
            target_pile[agent_idx] += 1
@ -223,7 +236,8 @@ def update_target_pile(env, agent_idx, target_pile, indices, cfg):
            target_pile[agent_idx] += 1


-def door_is_close(env, agent_idx):
+def is_door_close(env, agent_idx):
+    """ Checks whether the agent is close to a door """
    neighbourhood = [y for x in env.state.entities.neighboring_positions(env.state[nms.AGENT][agent_idx].pos)
                    for y in env.state.entities.pos_dict[x] if nms.DOOR in y.name]
    if neighbourhood:
@ -231,6 +245,7 @@ def door_is_close(env, agent_idx):


 def get_all_cleaned_dirt_piles(dirt_piles_positions, cleaned_dirt_piles, n_agents):
+    """ Returns all dirt piles cleaned by any agent """
    meta_cleaned_dirt_piles = {pos: False for pos in dirt_piles_positions}
    for agent_idx in range(n_agents):
        for (pos, cleaned) in cleaned_dirt_piles[agent_idx].items():
@ -240,6 +255,7 @@ def get_all_cleaned_dirt_piles(dirt_piles_positions, cleaned_dirt_piles, n_agent


 def handle_finished_episode(obs, agents, cfg):
+    """ Finish up episode, calculate advantages and perform policy net and value net updates"""
    with torch.inference_mode(False):
        for ag_i, agent in enumerate(agents):
            # Get states, actions, rewards and values from rollout buffer
@ -268,6 +284,7 @@ def handle_finished_episode(obs, agents, cfg):


 def split_into_chunks(data_tuple, cfg):
+    """ Chunks episode data into approximately equal sized chunks to prevent system memory failure from overload """
    result = [data_tuple]
    chunk_size = cfg[nms.ALGORITHM][nms.CHUNK_EPISODE]
    if chunk_size > 0:
@ -286,7 +303,8 @@ def split_into_chunks(data_tuple, cfg):
    return result


-def set_agent_spawnpoint(env, n_agents):
+def set_agents_spawnpoints(env, n_agents):
+    """ Tell environment where the agents should spawn in the next episode """
    for agent_idx in range(n_agents):
        agent_name = list(env.state.agents_conf.keys())[agent_idx]
        current_pos_pointer = env.state.agents_conf[agent_name][nms.POS_POINTER]
@ -299,6 +317,7 @@ def set_agent_spawnpoint(env, n_agents):


 def save_configs(results_path, cfg, factory_conf, eval_factory_conf):
+    """ Save configurations for logging purposes """
    with open(f"{results_path}/MARL_config.txt", "w") as txt_file:
        txt_file.write(str(cfg))
    with open(f"{results_path}/train_env_config.txt", "w") as txt_file:
@ -308,6 +327,7 @@ def save_configs(results_path, cfg, factory_conf, eval_factory_conf):


 def save_agent_models(results_path, agents):
+    """ Save model parameters after training """
    for idx, agent in enumerate(agents):
        agent.pi.save_model_parameters(results_path)
        agent.vf.save_model_parameters(results_path)
--- a/marl_factory_grid/algorithms/tsp/TSP_runner.py
+++ b/marl_factory_grid/algorithms/tsp/TSP_runner.py
@ -0,0 +1,61 @@
+import os
+from pathlib import Path
+
+from tqdm import trange
+
+from marl_factory_grid import Factory
+from marl_factory_grid.algorithms.tsp.contortions import get_dirt_quadrant_tsp_agents, get_two_rooms_tsp_agents
+
+
+def dirt_quadrant_multi_agent_tsp_eval(emergent_phenomenon):
+    run_tsp_setting("dirt_quadrant", emergent_phenomenon)
+
+
+def two_rooms_multi_agent_tsp_eval(emergent_phenomenon):
+    run_tsp_setting("two_rooms", emergent_phenomenon)
+
+
+def run_tsp_setting(config_name, emergent_phenomenon, n_episodes=1):
+    # Render at each step?
+    render = True
+
+    # Path to config File
+    path = Path(f'./marl_factory_grid/environment/configs/tsp/{config_name}.yaml')
+
+    # Create results folder
+    runs = os.listdir("./study_out/")
+    run_numbers = [int(run[7:]) for run in runs if run[:7] == "tsp_run"]
+    next_run_number = max(run_numbers) + 1 if run_numbers else 0
+    results_path = f"./study_out/tsp_run{next_run_number}"
+    os.mkdir(results_path)
+
+    # Env Init
+    factory = Factory(path)
+
+    with open(f"{results_path}/env_config.txt", "w") as txt_file:
+        txt_file.write(str(factory.conf))
+
+    for episode in trange(n_episodes):
+        _ = factory.reset()
+        done = False
+        if render:
+            factory.render()
+            factory._renderer.fps = 5
+        if config_name == "dirt_quadrant":
+            agents = get_dirt_quadrant_tsp_agents(emergent_phenomenon, factory)
+        elif config_name == "two_rooms":
+            agents = get_two_rooms_tsp_agents(emergent_phenomenon, factory)
+        else:
+            print("Config name does not exist. Abort...")
+            break
+        while not done:
+            a = [x.predict() for x in agents]
+            # Have this condition, to terminate as soon as all dirt piles are collected. This ensures that the implementation
+            # of the TSP agent is equivalent to that of the RL agent
+            if 'DirtPiles' in list(factory.state.entities.keys()) and factory.state.entities['DirtPiles'].global_amount == 0.0:
+                break
+            obs_type, _, _, done, info = factory.step(a)
+            if render:
+                factory.render()
+            if done:
+                break
--- a/marl_factory_grid/algorithms/tsp/contortions.py
+++ b/marl_factory_grid/algorithms/tsp/contortions.py
@ -1,12 +1,6 @@
-import os
-from pathlib import Path
-
 import numpy as np
-from tqdm import trange
-
 from marl_factory_grid.algorithms.tsp.TSP_dirt_agent import TSPDirtAgent
 from marl_factory_grid.algorithms.tsp.TSP_target_agent import TSPTargetAgent
-from marl_factory_grid.environment.factory import Factory


 def get_dirt_quadrant_tsp_agents(emergent_phenomenon, factory):
@ -58,62 +52,4 @@ def get_two_rooms_tsp_agents(emergent_phenomenon, factory):
        for agent in agents:
            for u, v, weight in agent._position_graph.edges(data='weight'):
                agent._position_graph[u][v]['weight'] = edge_costs[f"{u}-{v}"]
-    return agents
-
-
-def run_tsp_setting(config_name, emergent_phenomenon):
-    # Render at each step?
-    render = True
-
-    # Path to config File
-    path = Path(f'../marl_factory_grid/environment/configs/tsp/{config_name}.yaml')
-
-    # Create results folder
-    runs = os.listdir("../study_out/")
-    run_numbers = [int(run[7:]) for run in runs if run[:7] == "tsp_run"]
-    next_run_number = max(run_numbers) + 1 if run_numbers else 0
-    results_path = f"../study_out/tsp_run{next_run_number}"
-    os.mkdir(results_path)
-
-    # Env Init
-    factory = Factory(path)
-
-    with open(f"{results_path}/env_config.txt", "w") as txt_file:
-        txt_file.write(str(factory.conf))
-
-    for episode in trange(1):
-        _ = factory.reset()
-        done = False
-        if render:
-            factory.render()
-            factory._renderer.fps = 5
-        if config_name == "dirt_quadrant":
-            agents = get_dirt_quadrant_tsp_agents(emergent_phenomenon, factory)
-        elif config_name == "two_rooms":
-            agents = get_two_rooms_tsp_agents(emergent_phenomenon, factory)
-        else:
-            print("Config name does not exist. Abort...")
-            break
-        while not done:
-            a = [x.predict() for x in agents]
-            # Have this condition, to terminate as soon as all dirt piles are collected. This ensures that the implementation
-            # of the TSP agent is equivalent to that of the RL agent
-            if 'DirtPiles' in list(factory.state.entities.keys()) and factory.state.entities['DirtPiles'].global_amount == 0.0:
-                break
-            obs_type, _, _, done, info = factory.step(a)
-            if render:
-                factory.render()
-            if done:
-                break
-
-
-def dirt_quadrant_multi_agent_tsp(emergent_phenomenon):
-    run_tsp_setting("dirt_quadrant", emergent_phenomenon)
-
-
-def two_rooms_multi_agent_tsp(emergent_phenomenon):
-    run_tsp_setting("two_rooms", emergent_phenomenon)
-
-
-if __name__ == '__main__':
-    dirt_quadrant_multi_agent_tsp(False)
+    return agents
--- a/marl_factory_grid/algorithms/utils.py
+++ b/marl_factory_grid/algorithms/utils.py
@ -58,7 +58,7 @@ def load_yaml_file(path: Path):

 def add_env_props(cfg):
    # Path to config File
-    env_path = Path(f'../marl_factory_grid/environment/configs/{cfg["env"]["env_name"]}.yaml')
+    env_path = Path(f'./marl_factory_grid/environment/configs/{cfg["env"]["env_name"]}.yaml')

    # Env Init
    factory = Factory(env_path)
--- a/marl_factory_grid/environment/configs/marl_eval/dirt_quadrant_eval_config.yaml
+++ b/marl_factory_grid/environment/configs/marl_eval/dirt_quadrant_eval_config.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: quadrant
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/marl_eval/two_rooms_eval_config.yaml
+++ b/marl_factory_grid/environment/configs/marl_eval/two_rooms_eval_config.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: two_rooms
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/marl_eval/two_rooms_eval_config_emergent.yaml
+++ b/marl_factory_grid/environment/configs/marl_eval/two_rooms_eval_config_emergent.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: two_rooms
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/rl/dirt_quadrant_agent1_eval_config.yaml
+++ b/marl_factory_grid/environment/configs/rl/dirt_quadrant_agent1_eval_config.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: quadrant
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/rl/dirt_quadrant_agent1_train_config.yaml
+++ b/marl_factory_grid/environment/configs/rl/dirt_quadrant_agent1_train_config.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: quadrant
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/rl/two_rooms_agent1_eval_config.yaml
+++ b/marl_factory_grid/environment/configs/rl/two_rooms_agent1_eval_config.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: two_rooms
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/rl/two_rooms_agent1_train_config.yaml
+++ b/marl_factory_grid/environment/configs/rl/two_rooms_agent1_train_config.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: two_rooms
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/rl/two_rooms_agent2_eval_config.yaml
+++ b/marl_factory_grid/environment/configs/rl/two_rooms_agent2_eval_config.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: two_rooms
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/rl/two_rooms_agent2_train_config.yaml
+++ b/marl_factory_grid/environment/configs/rl/two_rooms_agent2_train_config.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: two_rooms
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/tsp/dirt_quadrant.yaml
+++ b/marl_factory_grid/environment/configs/tsp/dirt_quadrant.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: quadrant
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/environment/configs/tsp/two_rooms.yaml
+++ b/marl_factory_grid/environment/configs/tsp/two_rooms.yaml
@ -6,7 +6,7 @@ General:
  # The level.txt file to load from marl_factory_grid/levels
  level_name: two_rooms
  # View Radius
-  pomdp_r: 0 # 0 = full observability
+  pomdp_r: 0 # Use custom partial observability setting
  # Print all messages and events
  verbose: false
  # Run tests
--- a/marl_factory_grid/utils/plotting/plot_single_runs.py
+++ b/marl_factory_grid/utils/plotting/plot_single_runs.py
@ -204,15 +204,14 @@ direction_mapping = {
 }


-def plot_reward_development(reward_development, cfg, results_path):
+def plot_reward_development(reward_development, results_path):
    smoothed_data = np.convolve(reward_development, np.ones(10) / 10, mode='valid')
    plt.plot(smoothed_data)
    plt.ylim([-10, max(smoothed_data) + 20])
    plt.title('Smoothed Reward Development')
    plt.xlabel('Episode')
    plt.ylabel('Reward')
-    if cfg["env"]["save_and_log"]:
-        plt.savefig(f"{results_path}/smoothed_reward_development.png")
+    plt.savefig(f"{results_path}/smoothed_reward_development.png")
    plt.show()


@ -275,7 +274,7 @@ def plot_reached_flags_per_step():
    plt.show()


-def create_info_maps(env, used_actions, all_valid_observations, dirt_piles_positions, results_path, agents, act_dim,
+def create_info_maps(env, all_valid_observations, dirt_piles_positions, results_path, agents, act_dim,
                     a2c_instance):
    # Create value map
    with open(f"{results_path}/info_maps.txt", "w") as txt_file:
@ -323,6 +322,5 @@ def create_info_maps(env, used_actions, all_valid_observations, dirt_piles_posit
                        row += "[" + ', '.join(f"{x:7.4f}" for x in pmap[d, r]) + "]"
                    txt_file.write(row + "]")
                    txt_file.write("\n")
-            txt_file.write(f"Used actions: {used_actions}\n")

    return action_probabilities
--- a/marl_factory_grid/utils/renderer.py
+++ b/marl_factory_grid/utils/renderer.py
@ -327,7 +327,7 @@ class Renderer:
                    self.screen.blit(prob_text, prob_text_rect)

        pygame.display.flip()
-        self.save_screen("multi_action_graph", result_path)
+        self.save_screen("multi_action_graph", "." + result_path)

    def save_screen(self, filename, result_path):
        """
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,13 @@
+numpy
+pygame>=2.0
+numba>=0.56
+gymnasium>=0.26
+seaborn
+pandas
+PyYAML
+networkx
+torch
+tqdm
+packaging
+pillow
+scipy
--- a/setup.py
+++ b/setup.py
@ -1,39 +0,0 @@
-from setuptools import setup, find_packages
-from pathlib import Path
-this_directory = Path(__file__).parent
-long_description = (this_directory / "README.md").read_text()
-
-
-setup(name='Marl-Factory-Grid',
-      version='0.2.3',
-      description='A framework to research MARL agents in various setings.',
-      author='Steffen Illium',
-      author_email='steffen.illium@ifi.lmu.de',
-      url='https://github.com/illiumst/marl-factory-grid/import',
-      license='MIT',
-      keywords=[
-            'artificial intelligence',
-            'pytorch',
-            'multiagent reinforcement learning',
-            'simulation',
-            'emergence',
-            'gymnasium',
-            'environment',
-            'deepdiff',
-            'natsort',
-
-      ],
-      classifiers=[
-            'Development Status :: 4 - Beta',
-            'Intended Audience :: Developers',
-            'Topic :: Scientific/Engineering :: Artificial Intelligence',
-            'License :: OSI Approved :: MIT License',
-            'Programming Language :: Python :: 3.11',
-      ],
-      long_description=long_description,
-      long_description_content_type='text/markdown',
-      packages=find_packages(exclude=['examples']),
-      include_package_data=True,
-      install_requires=['numpy', 'pygame>=2.0', 'numba>=0.56', 'gymnasium>=0.26', 'seaborn', 'pandas',
-                        'pyyaml', 'networkx', 'torch', 'tqdm']
-      )