mirror of
https://github.com/illiumst/marl-factory-grid.git
synced 2025-05-22 14:56:43 +02:00
76 lines
3.8 KiB
ReStructuredText
76 lines
3.8 KiB
ReStructuredText
Basic Usage
|
|
===========
|
|
|
|
Environment objects, including agents, entities and rules, that are specified in a *yaml*-configfile will be loaded automatically.
|
|
Using ``quickstart_use`` creates a default config-file and another one that lists all possible options of the environment.
|
|
Also, it generates an initial script where an agent is executed in the environment specified by the config-file.
|
|
|
|
After initializing the environment using the specified configuration file, the script enters a reinforcement learning loop.
|
|
The loop consists of episodes, where each episode involves resetting the environment, executing actions, and receiving feedback.
|
|
|
|
Here's a breakdown of the key components in the provided script. Feel free to customize it based on your specific requirements:
|
|
|
|
1. **Initialization:**
|
|
|
|
>>> path = Path('marl_factory_grid/configs/default_config.yaml')
|
|
factory = Factory(path)
|
|
factory = EnvMonitor(factory)
|
|
factory = EnvRecorder(factory)
|
|
|
|
- The `path` variable points to the location of your configuration file. Ensure it corresponds to the correct path.
|
|
- `Factory` initializes the environment based on the provided configuration.
|
|
- `EnvMonitor` and `EnvRecorder` are optional components. They add monitoring and recording functionalities to the environment, respectively.
|
|
|
|
2. **Reinforcement Learning Loop:**
|
|
|
|
>>> for episode in trange(10):
|
|
_ = factory.reset()
|
|
done = False
|
|
if render:
|
|
factory.render()
|
|
action_spaces = factory.action_space
|
|
agents = []
|
|
|
|
- The loop iterates over a specified number of episodes (in this case, 10).
|
|
- `factory.reset()` resets the environment for a new episode.
|
|
- `factory.render()` is used for visualization if rendering is enabled.
|
|
- `action_spaces` stores the action spaces available for the agents.
|
|
- `agents` will store agent-specific information during the episode.
|
|
|
|
3. **Taking Actions:**
|
|
|
|
>>> while not done:
|
|
a = [randint(0, x.n - 1) for x in action_spaces]
|
|
obs_type, _, reward, done, info = factory.step(a)
|
|
if render:
|
|
factory.render()
|
|
|
|
- Within each episode, the loop continues until the environment signals completion (`done`).
|
|
- `a` represents a list of random actions for each agent based on their action space.
|
|
- `factory.step(a)` executes the actions, returning observation types, rewards, completion status, and additional information.
|
|
|
|
4. **Handling Episode Completion:**
|
|
|
|
>>> if done:
|
|
print(f'Episode {episode} done...')
|
|
|
|
- After each episode, a message is printed indicating its completion.
|
|
|
|
|
|
Evaluating the run
|
|
------------------
|
|
|
|
If monitoring and recording are enabled, the environment states will be traced and recorded automatically.
|
|
The EnvMonitor class acts as a wrapper for Gym environments, monitoring and logging key information during interactions,
|
|
while the EnvRecorder class records state summaries during interactions in the environment.
|
|
At the end of each run a plot displaying the step reward is generated. The step reward represents the cumulative sum of rewards obtained by all agents throughout the episode.
|
|
Furthermore a comparative plot that shows the achieved score (step reward) over several runs with different seeds or different parameter settings can be generated using the methods provided in plotting/plot_compare_runs.py.
|
|
For a more comprehensive evaluation, we recommend using the `Weights and Biases (W&B) <https://wandb.ai/site>`_ framework, with the dataframes generated by the monitor and recorder. These can be found in the run path specified in your script. W&B provides a powerful API for logging and visualizing model training metrics, enabling analysis using predefined or also custom metrics.
|
|
|
|
Indices and tables
|
|
------------------
|
|
|
|
* :ref:`genindex`
|
|
* :ref:`modindex`
|
|
* :ref:`search`
|