general overhaul, better images, better texts

This commit is contained in:
2024-02-05 23:16:26 +01:00
parent fd1d34a85a
commit da72fdcf7f
82 changed files with 149 additions and 188 deletions

View File

@ -7,11 +7,7 @@ header:
teaser: assets/figures/6_ood_pipeline.jpg
---
![PEOC Pipeline](\assets\figures\6_ood_pipeline.jpg){:style="display:block; margin-left:auto; margin-right:auto"}
One critical prerequisite for the deployment of reinforcement learning systems in the real world is the ability to reliably detect situations on which the agent was not trained. Such situations could lead to potential safety risks when wrong predictions lead to the execution of harmful actions. In this work, we propose PEOC, a new policy entropy based out-of-distribution classifier that reliably detects unencountered states in deep reinforcement learning. It is based on using the entropy of an agent's policy as the classification score of a one-class classifier. We evaluate our approach using a procedural environment generator. Results show that PEOC is highly competitive against state-of-the-art one-class classification algorithms on the evaluated environments. Furthermore, we present a structured process for benchmarking out-of-distribution classification in reinforcement learning.
{% cite sedlmeier2020peoc %}
![PEOC Performance](\assets\figures\6_ood_performance.jpg){:style="display:block; margin-left:auto; margin-right:auto"}
![PEOC Performance](\assets\figures\6_ood_performance.jpg){:style="display:block; width:45%" .align-right}In this work, the development of PEOC, a policy entropy-based classifier for detecting unencountered states in deep reinforcement learning, is proposed. Utilizing the agent's policy entropy as a score, PEOC effectively identifies out-of-distribution scenarios, crucial for ensuring safety in real-world applications. Evaluated against advanced one-class classifiers within procedurally generated environments, PEOC demonstrates competitive performance.
Additionally, a structured benchmarking process for out-of-distribution classification in reinforcement learning is presented, offering a comprehensive approach to evaluating such systems' reliability and effectiveness. {% cite sedlmeier2020policy %}
![PEOC Pipeline](\assets\figures\6_ood_pipeline.jpg){:style="display:block; width:90%" .align-center}