AWARD ABSTRACT. Staggering volumes of data sets collected by modern applications from financial transaction data to IoT sensor data contain critical insights from rare phenomena to anomalies indicative of fraud or failure. To decipher valuables from the counterfeit, analysts need to interactively sift through and explore the data deluge. By detecting anomalies, analysts may prevent fraud or prevent catastrophic sensor failures. While previously developed research offers a treasure trove of stand-alone algorithms for detecting particular types of outliers, they tend to be variations on a theme. There is no end-to-end paradigm to bring this wealth of alternate algorithms to bear in an integrated infrastructure to support anomaly discovery over potentially huge data sets while keeping the human in the loop.
This project is the first to design an integrated paradigm for end-to-end anomaly discovery. The project aims to support all stages of anomaly discovery by seamlessly integrating outlier-related services within one integrated platform. The result is a database-system inspired solution that models services as first class citizens for the discovery of outliers. It integrates outlier detection processes with data sub-spacing, explanations of outliers with respect to their context in the original data set, feedback on the relevance of outlier candidates, and metric-learning to refine the effectiveness of the outlier detection process. The resulting system enables the analyst to steer the discovery process with human ingenuity, empowered by near real-time interactive responsiveness during exploration. Our solution promises to be the first to achieve the power of sense making afforded by outlier explanation services and human feedback integrated into the discovery process.
INTELLECTUAL MERIT. End-to-end outlier discovery services represent a new direction in research on anomaly detection. This research goes well beyond developing yet another outlier detection algorithm. Instead, our project will demonstrate the feasibility of outlier discovery as a service. Intellectual merit arises from breaking fundamentally new ground in supporting outlier discovery from identification, explanation, to refinement by keeping human in the loop. For the first time, these critical capabilities are integrated into one unified paradigm. New principles and techniques are also designed to automatically tune the input parameters of outlier detection algorithms, better separate outliers from inliers via distance metric learning, effectively summarize and interpret outliers, and refine the detection algorithms driven by human feedback. By unifying these new techniques as well as the existing work in outlier detection algorithms within one platform, effective discovery of anomalies is moved into the realm of reality.
BROADER IMPACT. The discovery of anomalies promises to touch the lives of citizens by detecting finan-cial fraud and problems in medical diagnoses. The end-to-end outlier services will result in an effective systemfor combating fraud, identifying behavior irregularities, and detecting faulty sensors. This is game changing.The prevention of hospital-born infectious diseases in intensive care units represents a major societal impact. Broader impact also includes the dissemination of software as open source.
We are thankful for the support from NSF for this Outlier Discovery research project.