Anomaly detection is critical in many scientific and engineering fields ranging from identifying signatures of new cyberattacks to detecting seizures in EEG medical time series data sets. However, although previous research offers a plethora of anomaly detection algorithms, effective anomaly detection remains challenging for domain experts due because it involves a tedious manual tuning process. Specifically, users have to first hand-craft features to prepare the data, then determine which among many algorithms may be best suited for their particular task, and finally set parameters to assure the chosen algorithm performs well. This is challenging, because domain experts often lack sufficient understanding of specific detection algorithms and of machine learning in general. This project addresses this wide-spread problem by developing a robust self-tuning anomaly detection cyber-infrastructure called Self-Tuning ANomaly Detection service (STAND). STAND enables scientists and engineers who have little understanding about anomaly detection techniques to effectively make use of them. It supports a rich variety of disciplines, including but not limited to Medicine/Biology, Mechanical Engineering, and Cyber Security, all of which face data challenges requiring effective anomaly detection. The broader impacts of this project include societal and economy benefits due to combating fraud, detecting diseases, and effectively identifying device failures.
STAND uses a transformative new approach towards achieving self-tuning anomaly detection. As a first step, STAND automatically tests a range of unsupervised anomaly techniques on a given data set. It then extracts knowledge from these combined detection results to reliably capture the difference between anomalies and normal data. Thereafter, it uses this knowledge to train an anomaly classifier to classify anomalies with accuracy higher than what is achievable by the standard process of thorough manual tuning. This results in a domain-specific anomaly classifier that is ready to be deployed. STAND is general in that it can be applied across of a range of data types and domains. It encompasses components to help ingest and model common data types. It is extensible, allowing developers to plug in new anomaly algorithms with ease. Lastly, it is adaptable, by allowing users to steer the anomaly detection process through feedback based on their domain expertise.
We are thankful for the support from NSF for this Outlier Discovery research project.