The objective of Computer Vision (CV) research is to equip computers tosense the environment, understand the sensed data, take appropriate actions, and learn from this experience to improve future performance (Sebe, Cohen, Garg, Huang, 2005). Machine Learning is the primary driver of CV and requires hundreds of thousands of data points. Collecting real world data for ML is both expensive and difficult. Also, real-world data does not have the diversity required to train ML models. As a result, access to sufficient amounts of data is the biggest challenge facing CV applications. Beyond access to data, annotating real-world data is resource-intensive, expensive, and error-prone. Training using real-world data is also challenging because datasets are often biased and insufficient, affecting our ability to validate ML models. A viable solution to solve the data problems mentioned above is synthetic data.
This paper discusses how a combination of real world and synthetic data is used to train a CV model to detect Unexploded Ordnances (UXOs) on an airfield. Specifically, this paper will illustrate the approach SoarTech and Unity took to generate synthetic data at scale and train multiple Convolutional Neural Networks (CNNs) for UXO detection. It will present data showing the CNNs’ performance improvements using a “real-world only” dataset vs. a combination of 25% real world and 75% synthetic dataset. It will also discuss how the use of Domain Randomization to vary lighting (time of day, brightness), building and tree arrangement, concrete asphalt and paint stripping, rubble, camera parameters, and backgrounds Lastly, this paper will discuss techniques used to avoid overfitting to one kind of data.