Abstract
Artificial Intelligence (AI) and Machine Learning have become focal points for Department of Defense (DoD), as evidenced by President Trump announcing a $500 billion investment in infrastructure that is tied to AI. A critical aspect of this heightened investment in AI is the development of fully autonomous, unmanned aerial systems (UAS). However, the inherent challenges posed by the complexity of operational environments in which these UAS operate, coupled with the evolving prioritization of subgoals, present significant hurdles for traditional control algorithms. Reinforcement Learning (RL) offers a solution that will enable warfighters to dynamically devise actionable control strategies to achieve mission success, but RL can struggle with long horizon tasks and the formulation of a reward structure thereby causing long periods of trial and error by developers. When deploying the system, there are concerns when the agent begins to proceed into areas outside of its training process thereby leading to indeterminate results.
In previous work we examined how the usage of distribution and ensemble methods allows reinforcement learning agents to gain an understanding of both epistemic and aleatoric uncertainties. In this work we shift our focus to examine how uncertainty can be used during training to improve exploration and simplify reward function creation. Our initial results demonstrate up to a 20% decrease in iteration time by the developers, translating to more stable autonomy algorithms at the speed of relevance for the warfighter. Additionally, by normalizing the uncertainty, it can be utilized during deployment to prevent unknown behavior and trigger an appropriate contingency thereby allowing for autonomy that can be utilized in real world environments. Ultimately, this paper demonstrates how to utilize uncertainty while training and deploying predictable RL agents for the warfighter.