Many approaches to Artificial Intelligence and Machine Learning (AI/ML) rely on the premise that there is well- labeled training data. In many scenarios, however, such reliance is not feasible or even possible. In those cases, unsupervised learning approaches attempt to automatically discern patterns or class types within the data. Applications that deal with streaming or batched data that contain unknown or evolving class types typically fall into this category. A problem with many off-the-shelf algorithms is that they are designed to look at a single batch of data in isolation. This can cause inefficiencies and inaccuracies when applied in streaming environments. Implementers are forced to incorporate some form of reconciliation or deduping of classes. Knowledge learned from historic data is not leveraged when analyzing current data.
Dynamic, Unsupervised Clustering by Algorithmic Thresholding (DUCAT), first presented at I/ITSEC 2023, is a system designed to handle noisy, streaming data. When DUCAT identifies new class types, it codifies them so new data entering the system can be quickly checked for inclusion in previously identified classes. This removes the need for an external mechanism to reconcile classes and reduces the amount of data that needs to be searched for emergent clusters. Additionally, DUCAT is designed for noisy environments and can maintain tight cluster definitions even in situations with large amounts of noise.
In the year since DUCAT was first presented, the algorithm has been improved to allow for higher dimensional clustering and increased overall performance. This paper describes those improvements and presents accuracy and timing comparisons to other clustering approaches.
Keywords
ADAPTIVE;AI;CLASSIFICATION;EMERGING TECHNOLOGIES
Additional Keywords
Unsupervised, Clustering