Contributions
The recent advances in hardware and software have enabled the capture of different measurements of data in a wide range of fields. Applications that generate rapid, continuous, and large volumes of event streams include readings from sensors used in applications, such as physics, biology and chemistry experiments, weather sensors, health sensors, online auctions, and financial streams. Given these developments, the world is poised for a sea change in terms of variety, scale, and importance of applications that can be envisioned based on the real-time analysis and exploitation of such event streams for decision making from dynamic traffic management, environmental monitoring to health care alike. Clearly, the ability to infer relevant patterns from these event streams in real-time to make near instantaneous yet informed decisions, henceforth called complex event analytics (CEA), is crucial for these mission critical applications and this is the focus of our project work. We now list key contribution ideas next.
Gloria: Graph-based Sharing Optimizer for Event Trend Aggregation
We present a novel approach for optimizing the sharing plan for a workload of event trend aggregation queries with complex Kleene patterns. To the best of our knowledge, Gloria is the first sharing optimizer that offers flexible sharing decisions on nested Kleene patterns using effective online aggregation. We introduce Gloria graph to model a variety of sharing opportunities in a diverse event trend aggregation workload. This allows us to transform the workload sharing problem into a path search problem. A cost model and effective pruning rules are also introduced to reduce the size of the Gloria graph. We design a path search algorithm to find an event trend aggregation sharing plan from the Gloria graph. For a given complex query workload, our optimizer has a linear-time complexity in the number of nodes and edges in the Gloria graph, while still delivering optimality guarantees in most cases. The experimental evaluation on three public data sets demonstrates the effectiveness of our pruning principles and the quality of the produced workload sharing plan. Our sharing plan achieves up to 10x performance improvement over the state-of-the-art approaches.
To Share, or not to Share Online Event Trend Aggregation Over Bursty Event Streams
We present a novel framework Hamlet for optimizing a workload of queries computing aggregation over Kleene pattern matches, called event trends. Hamlet is the first to seamlessly integrate the power of online event trend aggregation and adaptive sharing decision-making. To tackle the exponential complexity of event trend construction, Hamlet deploys online trend aggregation strategy. To this end, we design the Hamlet graph that compactly captures all trends matched by queries in the workload and propagates trend aggregates through this graph without constructing the actual trends. This graph-based online execution strategy allows to reduce the time complexity of event trend aggregation from exponential to quadratic in the number of matched events. To quantify the trade-off between the benefit of sharing and the overhead of maintaining the intermediate trend aggregates per query, we design a light-weight sharing benefit model. We propose an effective sharing decision algorithm that adaptively at runtime activates the shared online execution based on the benefit of sharing the trend aggregation.
Shared Complex Event Trend Aggregation
We represent the workload of event trend aggregation queries as the workload template that exposes all sharing opportunities in the diverse nested Kleene patterns. Also, we avoid the exponential CPU costs of event trend construction by introducing MatStates (materialization states) to store intermediate aggregation results prior to sharing. We prove the correctness of the shared execution strategy. We analyze how the number of MatStates can be minimized to reduce both the CPU and memory overhead of sharing.
Scalable Event Trend Analytics for Data Stream Inquiry
The first contribution tackles the problem of executing queries with Kleene closure. Such queries extract event sequences of arbitrary, statically unknown length, which we call event trends. Due to common event sub-sequences in event trends, either the responsiveness is delayed by repeated computations or an exorbitant amount of memory is required to store partial results. Our solution compactly encodes event trend information using a graph-based data structure and detects trends in optimal CPU time given limited memory.
Complex Event Analytics
First we notice that existing techniques such as traditional online analytical processing (OLAP) systems are not designed for real-time pattern based operations, while state-of-the-art Complex Event Processing (CEP) systems designed for pattern matching tend to be limited in their expressive capability, and more importantly they do not support OLAP operations. Hence, in the context of event streams where the order and sequence of events are important, OLAP is clearly insufficient in supporting efficient event sequence analysis. Thus in this project, we design, develop and evaluate novel complex event analytics technology that achieves the above objectives.