Artificial Intelligence (AI) and Machine Learning have become focal points for Department of Defence (DoD), as evidenced by the President's Fiscal Year 2024 request for over $1.8 billion, marking a 50% increase from the previous year's allocation. A critical aspect of this heightened investment in AI is the development of fully autonomous, unmanned aerial systems (UAS). However, the inherent challenges posed by the complexity of operational environments in which these UAS operate, coupled with the evolving prioritization of subgoals, present significant hurdles for traditional control algorithms. Reinforcement Learning (RL) offers a solution that will enable warfighters to dynamically devise actionable control strategies to achieve mission success. Despite its potential, however, typical RL has limited ability to understand risk, which has raised concerns regarding its stability and safety in high-risk environments. Improving RL agents understanding of both intrinsic environmental uncertainty and uncertainty from limitations in its own knowledge can increase safety and trust in UAS automation.
We have begun examining how the combination of distribution and ensemble methods can help improve RL agents understanding of uncertainty. By understanding the distribution of environment rewards, we can utilize risk-aware metrics to improve safety and stability. With ensemble methods we can isolate the uncertainty due to limitation in exploration and knowledge, giving agents an estimate of their situational awareness. We can improve an agent’s awareness by pushing it explore those areas it is uncertain about more and we can prevent, by simple throttling, agents from taking actions when their uncertainty is too high. Our paper will compare the combined methods against a baseline of soft-actor critic algorithm in a commercial off the shelf (COTS) game engine. Ultimately, the paper will demonstrate that this modification results in safer agents for continuous control that can visualize the uncertainty when operating a UAS.
Keywords
ADAPTABILITY;AIR AND MISSILE DEFENSE;MACHINE LEARNING;RISK ASSESSMENT
Additional Keywords
Reinforcement Learning