Scalable Event Trend Analytics for Data Stream Inquiry, 2018

AWARD ABSTRACT. Data streams have grown in unprecedented scale and velocity in recent years. The real-time discovery of emerging event trends in data streams is essential for time-critical applications from computing infection spread patterns across major medical facilities to detecting frequent stock trends. Unfortunately, event trend analytics, i.e., the aggregation of complex event trends specified using Kleene-closure based patterns, is known to be not only of prohibitively high computational complexity but also to suffer from exorbitant memory utilization costs. This project overcomes the shortcomings of state-of-the-art systems by for the first time providing practical solutions for this important class of analytics. The invention of transformative strategies to provide these much needed Event Trend Analytics services is game changing. Event stream centric applications touch our daily life from health care to financial fraud. Empowering these fields by making advanced trend analytics capabilities practical in terms of their performance and easily accessible by integrating this important functionality into open-source software has a major societal impact. The integration of PI's project activities with the training of a future STEM workforce critical for the prosperity and well-being of this nation takes place within the new interdisciplinary degree programs in Data Science at WPI; both spearheaded and led by the PI. The PI also has a long history of working with diverse student populations at all levels - and is determined to similarly foster diversity of the participants involved in this project.

This project breaks fundamentally new ground in developing both a sound theoretical foundation and practical processing strategies that empower applications to pursue interactive event trend exploration with high responsiveness. This project unleashes the tremendous potential of supporting powerful Kleene-closure based analytics as first-class citizens in modern stream processing systems. Scalable Event Trend Analytics empowers applications to gain trend-related insights from high-velocity stream data. Project activities include: (a) expressive event trend modeling and matching semantics, (b) compact event trend encoding techniques, (c) optimizaton methodology for aggregation push-down into the Kleene closure computation to avoid expensive trend construction, (d) aggregation computation at multiple granularity levels, (e) principled foundation of correctness and completeness of the query processing paradigm, and (f) distributed and shared execution of trend analytics for scale-up. Principles from database systems -- from query rewriting, optimization, execution sharing, to scalable cloud computing -- are applied to tackle this challenging problem. Innovations from novel trend aggregation strategies that selectively skip expensive trend construction to compression methods for aggregation state models are transformative -- pushing the envelope on real-time event trend processing at scale. The evaluation of these innovations using real-world workloads in partnerships with industry collaborators with expertise in healthcare and financial applications establishes their utility and effectiveness.