In 2022, the value of the global data collection and labeling market was $2.13 billion, with a forecasted value of $12.75 billion by 2030. Driving factors for this trend include Artificial Intelligence (AI) and Machine Learning (ML) advancements, which provide insights to organizations such as the Department of Defense (DoD). When proving out new AI/ML concepts in a DoD environment, a major challenge is the lack of availability of relevant data at an unclassified level. Although large amounts of data are collected and transmitted continuously by DoD platforms, many layers of bureaucracy exist between data and research & development (R&D) organizations that may only have limited budgets and short time-periods to obtain stakeholder buy-in. Even if bureaucratic processes were removed and relevant data were immediately available, the data would most likely be unorganized and formatted incorrectly. Furthermore, different supervised learning models may have differing input data requirements. Manually labeling and formatting data for each of the chosen models could be necessary, causing delays and cost increases in the development cycle. How can DoD-focused R&D organizations improve this process and ensure that innovation can be accelerated while continuing to abide by data governance policies and procedures?
One solution is to employ synthetic data generation techniques using simulation tools to quickly generate datasets that can be used to train ML models. These techniques enable rapid prototyping and validation of models. The paper will describe the development process for an automatically-labeled synthetic imagery dataset used to train a semantic segmentation ensemble that provides a change detection capability and an instance segmentation model that identifies objects at a pixel level. Additionally, the paper will describe the training process for the models, explain how the trained machine learning models performed with real drone imagery, and finally, describe conclusions about the work regarding applicability to the warfighter.
Keywords
AI, DATA, MACHINE LEARNING, SYNTHETIC ENVIRONMENT
Additional Keywords