mirror of
https://github.com/illiumst/marl-factory-grid.git
synced 2025-05-22 14:56:43 +02:00
Merge remote-tracking branch 'origin/documentation' into documentation
This commit is contained in:
commit
ca01dc6d3b
228
README.md
228
README.md
@ -1,176 +1,68 @@
|
||||
# EDYS
|
||||
# About EDYS
|
||||
|
||||
### Tackling emergent dysfunctions (EDYs) in cooperation with Fraunhofer-IKS.
|
||||
|
||||
Collaborating with Fraunhofer-IKS, this project is dedicated to investigating Emergent Dysfunctions (EDYs)
|
||||
within multi-agent environments.
|
||||
|
||||
### Project Objectives:
|
||||
|
||||
- Create an environment that provokes emerging dysfunctions.
|
||||
|
||||
- This is achieved by creating a high level of background noise in the domain, where various entities perform diverse tasks,
|
||||
resulting in a deliberately chaotic dynamic.
|
||||
- The goal is to observe and analyze naturally occurring emergent dysfunctions within the complexity generated in this dynamic environment.
|
||||
|
||||
|
||||
- Observational Framework:
|
||||
|
||||
- The project introduces an environment that is designed to capture dysfunctions as they naturally occur.
|
||||
- The environment allows for continuous monitoring of agent behaviors, actions, and interactions.
|
||||
- Tracking emergent dysfunctions in real-time provides valuable data for analysis and understanding.
|
||||
|
||||
|
||||
- Compatibility
|
||||
- The Framework allows learning entities from different manufacturers and projects with varying representations
|
||||
of actions and observations to interact seamlessly within the environment.
|
||||
|
||||
|
||||
- Placeholders
|
||||
|
||||
- One can provide an agent with a placeholder observation that contains no information and offers no meaningful insights.
|
||||
- Later, when the environment expands and introduces additional entities available for observation, these new observations can be provided to the agent.
|
||||
- This allows for processes such as retraining on an already initialized policy and fine-tuning to enhance the agent's performance based on the enriched information.
|
||||
|
||||
Tackling emergent dysfunctions (EDYs) in cooperation with Fraunhofer-IKS
|
||||
|
||||
## Setup
|
||||
Install this environment using `pip install marl-factory-grid`.
|
||||
Install this environment using `pip install marl-factory-grid`. For more information click [here](docs/source/installation.rst).
|
||||
Refer to [quickstart](_quickstart) for specific scenarios.
|
||||
|
||||
## First Steps
|
||||
## Usage
|
||||
|
||||
### Quickstart
|
||||
Most of the env. objects (entites, rules and assets) can be loaded automatically.
|
||||
Just define what your environment needs in a *yaml*-configfile like:
|
||||
The majority of environment objects, including entities, rules, and assets, can be loaded automatically.
|
||||
Simply specify the requirements of your environment in a [*yaml*-configfile](marl_factory_grid/configs/default_config.yaml).
|
||||
|
||||
<details><summary>Example ConfigFile</summary>
|
||||
|
||||
# Default Configuration File
|
||||
|
||||
General:
|
||||
# RNG-seed to sample the same "random" numbers every time, to make the different runs comparable.
|
||||
env_seed: 69
|
||||
# Individual vs global rewards
|
||||
individual_rewards: true
|
||||
# The level.txt file to load from marl_factory_grid/levels
|
||||
level_name: large
|
||||
# View Radius; 0 = full observatbility
|
||||
pomdp_r: 3
|
||||
# Print all messages and events
|
||||
verbose: false
|
||||
# Run tests
|
||||
tests: false
|
||||
|
||||
# Agents section defines the characteristics of different agents in the environment.
|
||||
|
||||
# An Agent requires a list of actions and observations.
|
||||
# Possible actions: Noop, Charge, Clean, DestAction, DoorUse, ItemAction, MachineAction, Move8, Move4, North, NorthEast, ...
|
||||
# Possible observations: All, Combined, GlobalPosition, Battery, ChargePods, DirtPiles, Destinations, Doors, Items, Inventory, DropOffLocations, Maintainers, ...
|
||||
# You can use 'clone' as the agent name to have multiple instances with either a list of names or an int specifying the number of clones.
|
||||
Agents:
|
||||
Wolfgang:
|
||||
Actions:
|
||||
- Noop
|
||||
- Charge
|
||||
- Clean
|
||||
- DestAction
|
||||
- DoorUse
|
||||
- ItemAction
|
||||
- Move8
|
||||
Observations:
|
||||
- Combined:
|
||||
- Other
|
||||
- Walls
|
||||
- GlobalPosition
|
||||
- Battery
|
||||
- ChargePods
|
||||
- DirtPiles
|
||||
- Destinations
|
||||
- Doors
|
||||
- Items
|
||||
- Inventory
|
||||
- DropOffLocations
|
||||
- Maintainers
|
||||
|
||||
# Entities section defines the initial parameters and behaviors of different entities in the environment.
|
||||
# Entities all spawn using coords_or_quantity, a number of entities or coordinates to place them.
|
||||
Entities:
|
||||
# Batteries: Entities representing power sources for agents.
|
||||
Batteries:
|
||||
initial_charge: 0.8
|
||||
per_action_costs: 0.02
|
||||
|
||||
# ChargePods: Entities representing charging stations for Batteries.
|
||||
ChargePods:
|
||||
coords_or_quantity: 2
|
||||
|
||||
# Destinations: Entities representing target locations for agents.
|
||||
# - spawn_mode: GROUPED or SINGLE. Determines how destinations are spawned.
|
||||
Destinations:
|
||||
coords_or_quantity: 1
|
||||
spawn_mode: GROUPED
|
||||
|
||||
# DirtPiles: Entities representing piles of dirt.
|
||||
# - initial_amount: Initial amount of dirt in each pile.
|
||||
# - clean_amount: Amount of dirt cleaned in each cleaning action.
|
||||
# - dirt_spawn_r_var: Random variation in dirt spawn amounts.
|
||||
# - max_global_amount: Maximum total amount of dirt allowed in the environment.
|
||||
# - max_local_amount: Maximum amount of dirt allowed in one position.
|
||||
DirtPiles:
|
||||
coords_or_quantity: 10
|
||||
initial_amount: 2
|
||||
clean_amount: 1
|
||||
dirt_spawn_r_var: 0.1
|
||||
max_global_amount: 20
|
||||
max_local_amount: 5
|
||||
|
||||
# Doors are spawned using the level map.
|
||||
Doors:
|
||||
|
||||
# DropOffLocations: Entities representing locations where agents can drop off items.
|
||||
# - max_dropoff_storage_size: Maximum storage capacity at each drop-off location.
|
||||
DropOffLocations:
|
||||
coords_or_quantity: 1
|
||||
max_dropoff_storage_size: 0
|
||||
|
||||
# GlobalPositions.
|
||||
GlobalPositions: { }
|
||||
|
||||
# Inventories: Entities representing inventories for agents.
|
||||
Inventories: { }
|
||||
|
||||
# Items: Entities representing items in the environment.
|
||||
Items:
|
||||
coords_or_quantity: 5
|
||||
|
||||
# Machines: Entities representing machines in the environment.
|
||||
Machines:
|
||||
coords_or_quantity: 2
|
||||
|
||||
# Maintainers: Entities representing maintainers that aim to maintain machines.
|
||||
Maintainers:
|
||||
coords_or_quantity: 1
|
||||
|
||||
# Zones: Entities representing zones in the environment.
|
||||
Zones: { }
|
||||
|
||||
|
||||
# Rules section specifies the rules governing the dynamics of the environment.
|
||||
Rules:
|
||||
# Environment Dynamics
|
||||
# When stepping over a dirt pile, entities carry a ratio of the dirt to their next position
|
||||
EntitiesSmearDirtOnMove:
|
||||
smear_ratio: 0.2
|
||||
# Doors automatically close after a certain number of time steps
|
||||
DoorAutoClose:
|
||||
close_frequency: 10
|
||||
# Maintainers move at every time step
|
||||
MoveMaintainers:
|
||||
|
||||
# Respawn Stuff
|
||||
# Define how dirt should respawn after the initial spawn
|
||||
RespawnDirt:
|
||||
respawn_freq: 15
|
||||
# Define how items should respawn after the initial spawn
|
||||
RespawnItems:
|
||||
respawn_freq: 15
|
||||
|
||||
# Utilities
|
||||
# This rule defines the collision mechanic, introduces a related DoneCondition and lets you specify rewards.
|
||||
# Can be omitted/ignored if you do not want to take care of collisions at all.
|
||||
WatchCollisions:
|
||||
done_at_collisions: false
|
||||
|
||||
# Done Conditions
|
||||
# Define the conditions for the environment to stop. Either success or a fail conditions.
|
||||
# The environment stops when an agent reaches a destination
|
||||
DoneAtDestinationReach:
|
||||
# The environment stops when all dirt is cleaned
|
||||
DoneOnAllDirtCleaned:
|
||||
# The environment stops when a battery is discharged
|
||||
DoneAtBatteryDischarge:
|
||||
# The environment stops when a maintainer reports a collision
|
||||
DoneAtMaintainerCollision:
|
||||
# The environment stops after max steps
|
||||
DoneAtMaxStepsReached:
|
||||
max_steps: 500
|
||||
If you only plan on using the environment without making any modifications, use ``quickstart_use``.
|
||||
This creates a default config-file and another one that lists all possible options of the environment.
|
||||
Also, it generates an initial script where an agent is executed in the specified environment.
|
||||
For further details on utilizing the environment, refer to the documentation [here](docs/source/usage.rst).
|
||||
|
||||
</details>
|
||||
Existing modules include a variety of functionalities within the environment:
|
||||
- [Agents](marl_factory_grid/algorithms) implement either static strategies or learning algorithms based on the specific configuration.
|
||||
- Their action set includes opening [doors](marl_factory_grid/modules/doors/entitites.py), cleaning
|
||||
[dirt](marl_factory_grid/modules/clean_up/entitites.py), picking up [items](marl_factory_grid/modules/items/entitites.py) and
|
||||
delivering them to designated drop-off locations.
|
||||
- Agents are equipped with a [battery](marl_factory_grid/modules/batteries/entitites.py) that gradually depletes over time if not charged at a chargepod.
|
||||
- The [maintainer](marl_factory_grid/modules/maintenance/entities.py) aims to repair [machines](marl_factory_grid/modules/machines/entitites.py) that lose health over time.
|
||||
|
||||
Have a look in [\quickstart](./quickstart) for further configuration examples.
|
||||
## Customization
|
||||
|
||||
### Make it your own
|
||||
If you plan on modifying the environment by for example adding entities or rules, use ``quickstart_modify``.
|
||||
This creates a template module and a script that runs an agent, incorporating the generated module.
|
||||
More information on how to modify the levels, entities, groups, rules and assets [here](docs/source/modifications.rst).
|
||||
|
||||
#### Levels
|
||||
Varying levels are created by defining Walls, Floor or Doors in *.txt*-files (see [./environment/levels](./environment/levels) for examples).
|
||||
### Levels
|
||||
Varying levels are created by defining Walls, Floor or Doors in *.txt*-files (see [levels](marl_factory_grid/levels) for examples).
|
||||
Define which *level* to use in your *configfile* as:
|
||||
```yaml
|
||||
General:
|
||||
@ -180,16 +72,16 @@ General:
|
||||
Make sure to use `#` as [Walls](marl_factory_grid/environment/entity/wall.py), `-` as free (walkable) floor, `D` for [Walls](./modules/doors/entities.py).
|
||||
Other Entites (define you own) may bring their own `Symbols`
|
||||
|
||||
#### Entites
|
||||
### Entites
|
||||
Entites are [Objects](marl_factory_grid/environment/entity/object.py) that can additionally be assigned a position.
|
||||
Abstract Entities are provided.
|
||||
|
||||
#### Groups
|
||||
### Groups
|
||||
[Groups](marl_factory_grid/environment/groups/objects.py) are entity Sets that provide administrative access to all group members.
|
||||
All [Entites](marl_factory_grid/environment/entity/global_entities.py) are available at runtime as EnvState property.
|
||||
|
||||
|
||||
#### Rules
|
||||
### Rules
|
||||
[Rules](marl_factory_grid/environment/entity/object.py) define how the environment behaves on microscale.
|
||||
Each of the hookes (`on_init`, `pre_step`, `on_step`, '`post_step`', `on_done`)
|
||||
provide env-access to implement customn logic, calculate rewards, or gather information.
|
||||
@ -198,7 +90,7 @@ provide env-access to implement customn logic, calculate rewards, or gather info
|
||||
|
||||
[Results](marl_factory_grid/environment/entity/object.py) provide a way to return `rule` evaluations such as rewards and state reports
|
||||
back to the environment.
|
||||
#### Assets
|
||||
### Assets
|
||||
Make sure to bring your own assets for each Entity living in the Gridworld as the `Renderer` relies on it.
|
||||
PNG-files (transparent background) of square aspect-ratio should do the job, in general.
|
||||
|
||||
@ -207,5 +99,3 @@ PNG-files (transparent background) of square aspect-ratio should do the job, in
|
||||
<html      html>
|
||||
<img src="/marl_factory_grid/environment/assets/agent/agent.png" width="5%">
|
||||
|
||||
|
||||
|
||||
|
@ -1,5 +1,70 @@
|
||||
How to modify the environment or write modules
|
||||
===============================================
|
||||
|
||||
Modifying levels
|
||||
----------------
|
||||
Varying levels are created by defining Walls, Floor or Doors in *.txt*-files (see `levels`_ for examples).
|
||||
Define which *level* to use in your *config file* as:
|
||||
|
||||
.. _levels: marl_factory_grid/levels
|
||||
|
||||
>>> General:
|
||||
level_name: rooms # 'simple', 'narrow_corridor', 'eight_puzzle',...
|
||||
|
||||
... or create your own. Maybe with the help of `asciiflow.com <https://asciiflow.com/#/>`_.
|
||||
Make sure to use `#` as `Walls`_ , `-` as free (walkable) floor and `D` for `Doors`_.
|
||||
Other Entities (define your own) may bring their own `Symbols`.
|
||||
|
||||
.. _Walls: marl_factory_grid/environment/entity/wall.py
|
||||
.. _Doors: modules/doors/entities.py
|
||||
|
||||
|
||||
Modifying Entites
|
||||
----------------
|
||||
Entities are `Objects`_ that can additionally be assigned a position.
|
||||
Abstract Entities are provided.
|
||||
|
||||
If you wish to introduce new entities to the environment just create a new module that implements the entity class. If
|
||||
necessary, provide additional classe such as custom actions or rewards and load the entity into the environment using
|
||||
the config file.
|
||||
|
||||
.. _Objects: marl_factory_grid/environment/entity/object.py
|
||||
|
||||
Modifying Groups
|
||||
----------------
|
||||
`Groups`_ are entity Sets that provide administrative access to all group members.
|
||||
All `Entity Collections`_ are available at runtime as a property of the env state.
|
||||
If you add an entity, you probably also want a collection of that entity.
|
||||
|
||||
.. _Groups: marl_factory_grid/environment/groups/objects.py
|
||||
.. _Entity Collections: marl_factory_grid/environment/entity/global_entities.py
|
||||
|
||||
Modifying Rules
|
||||
----------------
|
||||
`Rules`_ define how the environment behaves on micro scale.
|
||||
Each of the hooks (`on_init`, `pre_step`, `on_step`, '`post_step`', `on_done`) provide env-access to implement custom
|
||||
logic, calculate rewards, or gather information.
|
||||
|
||||
If you wish to introduce new rules to the environment make sure it implements the Rule class and override its' hooks
|
||||
to implement your own rule logic.
|
||||
|
||||
.. _Rules: marl_factory_grid/environment/entity/object.py
|
||||
|
||||
.. image:: ./images/Hooks_FIKS.png
|
||||
:alt: Hooks Image
|
||||
|
||||
Modifying Results
|
||||
----------------
|
||||
`Results`_ provide a way to return `rule` evaluations such as rewards and state reports back to the environment.
|
||||
|
||||
.. _Results: marl_factory_grid/utils/results.py
|
||||
|
||||
Modifying Assets
|
||||
----------------
|
||||
Make sure to bring your own assets for each Entity living in the Gridworld as the `Renderer` relies on it.
|
||||
PNG-files (transparent background) of square aspect-ratio should do the job, in general.
|
||||
|
||||
.. image:: ./marl_factory_grid/environment/assets/wall.png
|
||||
:alt: Wall Image
|
||||
.. image:: ./marl_factory_grid/environment/assets/agent/agent.png
|
||||
:alt: Agent Image
|
||||
|
@ -1,3 +1,65 @@
|
||||
How to use the environment with your agents
|
||||
Using the environment with your agents
|
||||
===========================================
|
||||
|
||||
Environment objects, including agents, entities and rules, that are specified in a *yaml*-configfile will be loaded automatically.
|
||||
Using ``quickstart_use`` creates a default config-file and another one that lists all possible options of the environment.
|
||||
Also, it generates an initial script where an agent is executed in the environment specified by the config-file.
|
||||
|
||||
After initializing the environment using the specified configuration file, the script enters a reinforcement learning loop.
|
||||
The loop consists of episodes, where each episode involves resetting the environment, executing actions, and receiving feedback.
|
||||
|
||||
Here's a breakdown of the key components in the provided script. Feel free to customize it based on your specific requirements:
|
||||
|
||||
1. **Initialization:**
|
||||
|
||||
>>> path = Path('marl_factory_grid/configs/default_config.yaml')
|
||||
factory = Factory(path)
|
||||
factory = EnvMonitor(factory)
|
||||
factory = EnvRecorder(factory)
|
||||
|
||||
- The `path` variable points to the location of your configuration file. Ensure it corresponds to the correct path.
|
||||
- `Factory` initializes the environment based on the provided configuration.
|
||||
- `EnvMonitor` and `EnvRecorder` are optional components. They add monitoring and recording functionalities to the environment, respectively.
|
||||
|
||||
2. **Reinforcement Learning Loop:**
|
||||
|
||||
>>> for episode in trange(10):
|
||||
_ = factory.reset()
|
||||
done = False
|
||||
if render:
|
||||
factory.render()
|
||||
action_spaces = factory.action_space
|
||||
agents = []
|
||||
|
||||
- The loop iterates over a specified number of episodes (in this case, 10).
|
||||
- `factory.reset()` resets the environment for a new episode.
|
||||
- `factory.render()` is used for visualization if rendering is enabled.
|
||||
- `action_spaces` stores the action spaces available for the agents.
|
||||
- `agents` will store agent-specific information during the episode.
|
||||
|
||||
3. **Taking Actions:**
|
||||
|
||||
>>> while not done:
|
||||
a = [randint(0, x.n - 1) for x in action_spaces]
|
||||
obs_type, _, reward, done, info = factory.step(a)
|
||||
if render:
|
||||
factory.render()
|
||||
|
||||
- Within each episode, the loop continues until the environment signals completion (`done`).
|
||||
- `a` represents a list of random actions for each agent based on their action space.
|
||||
- `factory.step(a)` executes the actions, returning observation types, rewards, completion status, and additional information.
|
||||
|
||||
4. **Handling Episode Completion:**
|
||||
|
||||
>>> if done:
|
||||
print(f'Episode {episode} done...')
|
||||
|
||||
- After each episode, a message is printed indicating its completion.
|
||||
|
||||
|
||||
Evaluating the run
|
||||
----
|
||||
|
||||
If monitoring and recording are enabled, the environment states will be traced and recorded automatically.
|
||||
|
||||
Plotting. At the moment a plot of the evaluation score across the different episodes is automatically generated.
|
||||
|
Loading…
x
Reference in New Issue
Block a user