Outlier Discovery Paradigm

Staggering volumes of data sets collected by modern applications from financial transaction data to IoT sensor data contain critical insights from rare phenomena to anomalies indicative of fraud or failure. To decipher valuables from the counterfeit, analysts need to interactively sift through and explore the data deluge. By detecting anomalies, analysts may prevent fraud or prevent catastrophic sensor failures.

While previously developed research offers a treasure trove of stand-alone algorithms for detecting particular types of outliers, they tend to be variations on a theme. There is no end-to-end paradigm to bring this wealth of alternate algorithms to bear in an integrated infrastructure to support anomaly discovery over potentially huge data sets while keeping the human in the loop.

Collaborative Research: ELEMENTS: Tuning-free Anomaly Detection Service

Anomaly detection is critical in many scientific and engineering fields ranging from identifying signatures of new cyberattacks to detecting seizures in EEG medical time series data sets. However, although previous research offers a plethora of anomaly detection algorithms, effective anomaly detection remains challenging for domain experts due because it involves a tedious manual tuning process. Specifically, users have to first hand-craft features to prepare the data, then determine which among many algorithms may be best suited for their particular task, and finally set parameters to assure the chosen algorithm performs well. This is challenging, because domain experts often lack sufficient understanding of specific detection algorithms and of machine learning in general. This project addresses this wide-spread problem by developing a robust self-tuning anomaly detection cyber-infrastructure called Self-Tuning ANomaly Detection service (STAND).

Learn more »

Our team

Faculty

Dr. Elke Rundensteiner

Professor of Computer Science

Worcester Polytechnic Institute

Dr. Samuel Madden

Professor of Electrical Eng. and Comp. Science

Massachusetts Institute of Technology

Dr. Lei Cao

Assistant Professor of Computer Science

University of Arizona, Tucson

PhD Students

Lei Ma

PhD Student in Computer Science

Worcester Polytechnic Institute

Dennis Hofmann

PhD Student in Data Science

Worcester Polytechnic Institute

Peter VanNostrand

PhD Student in Data Science

Worcester Polytechnic Institute

Collaborators

Mathan Gopalsamy

Senior Data Scientist

Signify AI/ML Research Group

Dr. Xiao Qin

Applied Scientist

Amazon AWS, AI/ML

Dr. Chuan Lei

Senior Applied Scientist

AWS AI Research and Education

Dr. Huayi Zhang

Machine Learning Engineer

TikTok Inc.

Dr. Yizhou Yan

Research Scientist

Meta Research

Robert Jensen

Regional Lead

US Army Research Laboratory

Publications

Selected Recent Publications.

  • Huayi Zhang. Towards An End-to-End Training Data Management System for Machine Learning Models. Worcester Polytechnic Institution, PhD Dissertation 2023.
  • Lei Cao, Yizhou Yan, Samuel Madden, and Elke Rundensteiner. AutoOD: Automatic Outlier Detection, ACM SIGMOD 2022.
  • Dennis Hofmann, Peter VanNostrand, Huayi Zhang, Yizhou Yan, Lei Cao, Samuel Madden, and Elke Rundensteiner. A Demonstration of AutoOD: A Self-Tuning Anomaly Detection, VLDB 2022.
  • Huayi Zhang, Lei Cao, Samuel Madden, Elke Rundensteiner. LANCET: Labeling Complex Data at Scale, VLDB 2021.
  • Huayi Zhang, Lei Cao, Peter VanNostrand, Sam Madden, and Elke Rundensteiner, ELITE: Robust Deep Anomaly Detection with Meta Gradient, ACM SIGKDD 2021
  • Huayi Zhang, Lei Cao, Yizhou Yan, Elke Rundensteiner and Samuel Madden, Continously Adaptive Similarity Search, ACM SIGMOD 2020
All Publications

System Releases

An interactive demo of our Automatic Anomaly Detection system is available below. AutoOD is a self-tuning anomaly detection system designed to address the challenges of method selection and hyper-parameter tuning while remaining unsupervised. AutoOD frees users from the tedious manual tuning process often required for anomaly detection by intelligently identifying high likelihood inliers and outliers. AutoOD features a responsive visual interface shown in the screenshots below allowing for seamless user interaction providing the user with insightful knowledge of how AutoOD operates.

Input Interface

Results Interface

Input Interface: Users can upload data, provide their own anomaly detection methods, specify the column of labels, and customize the expected percentage range of anomalies in their dataset.

Results Interface: Users can filter the chart based on metrics provided and interact with points by hovering over them to view summery statistics. Clicking on a point will provide that respective point's anomaly score for each unsupervised detector and attribute values from the input dataset. In addition, by moving the slider through each iteration, the user can watch the reliable object set change, and at any time select a point to view the contribution of each detector to its status.