Abstract
Combat operations require fast-paced decision-making where outcomes are dependent on innate ability, training and the impacts of stressors like workload and fatigue. As the data richness of the operational environment and time criticality increases, biological constraints on decision-making could limit performance, necessitating use of automated decision-support.
The advancement of Artificial Intelligence (AI) has opened avenues for enhancing these decision-support systems. Large Language Models (LLMs) having demonstrated exceptional capabilities in understanding and generating human-like text, enabling creation of complex recommendations for a wide range of strategic and tactical scenarios.
While powerful, there is a need to evaluate the technology’s impacts on users. Particularly to determine if outputs are credible and trusted, whether suggestions are acted upon and lead to objective performance improvements, and identification of any negative impacts such as distraction or frustration.
To investigate these issues we have integrated a proof-of-concept LLM-based course-of-action-analysis tool with a constructive simulation platform. The tool responded to the real-time state of a ‘protection of a sensitive site’ scenario, providing suggestions of the next step to respond to a range of threats, and enabling users to query the suggestions using natural language.
This paper outlines the findings from a human participation study evaluating the technology in simple and complex scenarios compared to a baseline of manual decision-making without AI support. We consider the impacts in terms of objective performance, subjective experience, AI usage and trust, and attention determined using eye tracking.
While the AI reacted to the simulation in real-time, provided credible suggestions and responded effectively to queries, this may not lead to performance improvements. In particular, if not effectively integrated within a simulation, LLM’s can increase perceived workload and cause a significant level of distraction. Key issues with hallucination, disregard for prompted scenario rules and opportunities for user misuse are also highlighted.