Abstract
Given the billions of dollars invested by DoD in artificial intelligence (AI) solutions for the warfighter, it is imperative to develop models for AI systems that allow agents to perform reliably and consistently. When a human switches context, from electro-optical (EO) images to synthetic aperture radar (SAR) generated images for example, they can look at the image and make sense without prior time invested in reviewing SAR imagery. This is a zero-shot approach to learning. Typical AI systems require additional learning stages for success at context switching between EO, infrared (IR), and SAR images. Without additional representative data, accuracy and confidence levels decrease.
This paper examines how to train an intelligent ML system to evaluate a new situation and make sense of it using zero shot learning approaches to Auto Target Recognition (ATR). The use of latent space with real-time transformer-based approaches in computer vision is examined for developing zero-shot algorithms for ATR. This latent space approach encodes image data onto a manifold, using clustering algorithms to identify objects with similar physical features. A mathematical introduction to these concepts is provided, in addition to a description of their application in determining proximity of images to one another. By leveraging lower-dimensional space to represent the essential features of high-dimensional data, it is possible to map out objects in Euclidean space and determine their similarity to other images a Neural Network already knows. Models such as Real-Time Detection Transformer (RTDETR) have shown improved Mean Average Precision (mAP) over You Only Look Once (YOLO) model variants in literature. The combination of these approaches reduces the need for additional data in different modalities while maintaining model performance.