Outlier Discovery Paradigm

Staggering volumes of data sets collected by modern applications from financial transaction data to IoT sensor data contain critical insights from rare phenomena to anomalies indicative of fraud or failure. To decipher valuables from the counterfeit, analysts need to interactively sift through and explore the data deluge. By detecting anomalies, analysts may prevent fraud or prevent catastrophic sensor failures.

While previously developed research offers a treasure trove of stand-alone algorithms for detecting particular types of outliers, they tend to be variations on a theme. There is no end-to-end paradigm to bring this wealth of alternate algorithms to bear in an integrated infrastructure to support anomaly discovery over potentially huge data sets while keeping the human in the loop.

Learn more » Github

Award Abstract #1910880

IIS-III-Small: Outlier Discovery Paradigm

PI: Elke Rundensteiner, Worcester Polytechnic Institute

Staggering volumes of data sets collected by modern applications from financial transaction systems, smart health sensors, and Internet of Things devices contain critical insights from rare phenomena to anomalies indicative of financial fraud, health alerts to system failure, respectively. To decipher the valuables from the counterfeit, analysts need to interactively sift through and explore the data deluge. By discovering anomalies, analysts may detect financial fraud, identify behavior irregularities, or prevent catastrophic sensor failures -- touching the lives of citizens in countless ways. While a treasure trove of stand-alone algorithms for detecting particular types of outliers exit; they tend to be variations on a theme. This research project is game-changing in that it will offer the first end-to-end outlier services that bring this wealth of algorithms to bear in an integrated infrastructure to support effective anomaly discovery. Broader impact of this project also includes the integration of PI's project activities with the training of a STEM workforce impacting the PI's WPI REU data science summer site and the new interdisciplinary degree programs from PhD, MS to BS in Data Science; spearheaded and led by the PI. The PI has a long history of working with diverse student populations at all levels and is determined to similarly foster diversity of the participants involved in this proposed project.

This research will go well beyond developing yet another outlier detection algorithm; by instead demonstrating the feasibility of outlier discovery as a service. It will break fundamentally new ground in supporting outlier discovery from identification, refinement to explanation. The proposed end-to-end anomaly discovery paradigm will support all stages of anomaly discovery by seamlessly integrating outlier-related services within one integrated platform. The result is a database-system inspired solution that models services as first class citizens for the discovery of outliers. It integrates outlier detection processes with data sub-spacing, explanations of outliers with respect to their context in the original data set, user feedback on the relevance of outlier candidates in the domain, and metric-learning to refine the effectiveness of the outlier detection process. Evaluation using outlier benchmark data sets and real-world data sets and workloads explored in partnerships with collaborators from industry will be conducted to establish the utility of the innovation. In summary, the resulting system enables the analyst to steer the discovery process with human ingenuity, empowered by near real-time interactive responsiveness of the platform during exploration. Our solution promises to be the first to achieve the power of sense making afforded by outlier explanation services and human feedback integrated into the discovery process.

KEYWORDS: Outlier Analytics System; Human-in-Loop Data Discovery; Explanation; Data Mining.

>

Acknowledgments

This work is supported by NSF Project 1910880, NSF Project IIIS-1910880.

View details »