Artificial Intelligence (AI) based systems show great promise for supporting military decision making and complex planning, as AI systems can consider massive option spaces that far exceeds human capabilities. Game-playing AI systems, particularly those employing deep reinforcement learning (DRL), show superhuman levels of performance, with the capacity to unearth innovative strategies by exploring numerous Courses of Action (COAs) in complex scenarios. Opportunity exists for such AI systems to assist human planners, especially for Joint All Domain Operations (JADO), to construct more high-quality plans, more deeply explore strengths and weaknesses, and consider more alternatives in finite planning time. JADO planning must simultaneously consider multiple domains (e.g., Air, Land, Sea, Undersea) and interacting phenomena (e.g., maneuver, cyber) – massively expanding the size of action spaces beyond those employed by SoA game-playing AI. Real decision support systems must also provide deep foundations for trust, interpretability, and expression of commander intent – properties lacking in typical SoA black-box deep learning systems.
Addressing these challenges, this paper reports on a new DRL approach, called Neural Program Policies (NPPs), which constructs trainable COAs via a composition of a deep neural network with a structured domain-specific program to vastly reduce state and action spaces into a smaller, more meaningful, and tractably learnable subset. In this work we describe the domain specific language (DSL) that abstracts away actions and observations employed by a deep reinforcement learner. Then, we describe our OVERMIND framework for cross-simulator agent learning and self-play (e.g. via StarCraft2 and military simulators). We conclude with performance measurements for 1) the ability to generate COAs, and 2) NPP COA measurements, including e.g. action space reduction (>1000X vs SoA DRL algorithms), performance prediction, and reduction of simulation runs required for training. We also discuss how the approach supports a human-AI teamed paradigm to increase the number and quality of COAs considered.
Keywords
AI;DEEP LEARNING
Additional Keywords
Course of Action Generation, Military Decision Support, Domain Specific Language