Complex Event Analytics Project, 2010

AWARD ABSTRACT. Recent advances in sensor technologies and expansion of wired and wireless communication protocols enable us to continuously collect information about the physical world, resulting in a rich set of novel services. The ability to infer relevant patterns from these event streams in real-time and at various levels of abstractions to make near instantaneous decisions is crucial for a wide range of mission critical applications ranging from real-time crisis management to security. This project designs, implements, and evaluates a novel complex event processing methodology, henceforth called Complex Event Analytics (CEA). CEA integrates the capabilities of pattern matching from complex event processing with the power of multi-level analysis from static OLAP engines to provide multi-dimensional sequential pattern analysis over high-speed event streams. The CEA Model combines CEP and OLAP techniques for efficient multi-dimensional event pattern analysis at different abstraction levels. Based on interrelationships in both concept and pattern refinement among queries, sequence queries are composed into an integrated event pattern hierarchy. OLAP like operations enable analysts to navigate from one E-cuboid to another in this event analytics space. CEA optimization strategies, including rewriting rules, physical operators, and cost-based search algorithms, achieve scalable event processing. CEA offers high-performance analytics by maximal shared processing of event pattern queries. Experimental studies compare the CEA solution to the state-of-the-art, including traditional stream query systems and customized event engines. Intellectual merit lies in the design, development and evaluation of a novel Complex Event Analytics technology for real-time event stream analysis, -- a perfect middle ground offering both the sophisticated power of pattern matching found in modern event processing systems and the capability of online analytic techniques at multiple levels of abstraction of OLAP engines. CEA impacts society by facilitating a broad range of stream-centric applications ranging from monitoring of hygienecompliance to prevent the spread of infectuous diseases in medical settings to business intelligence processing, and by integrating project activities with education.

A few selected results are featured below. Full results can be found in the publications.

Sharing of Complex Event Processing
Exploiting Event Correlations

Complex Event Processing (CEP) has emerged as a technology of choice for high performance event analytics in time-critical decision making applications. Yet it is becoming increasingly difficult to support high-performance event processing due to the combination of the rising number and complexity of event pattern queries and the high velocity of event streams. In this work we design the SPASS* framework that successfully tackles these demanding CEP workloads. Our SPASS optimizer identifies opportunities for effective shared processing among CEP queries by leveraging time based event correlations among queries. The problem of pattern sharing is shown to be NP-hard by reducing the Minimum Substring Cover problem to the pattern sharing problem. The SPASS optimizer is designed that finds a shared pattern plan in polynomial time covering all sequence patterns while still guaranteeing an optimality bound. To execute this shared pattern plan, the SPASS runtime employs stream transactions that assure concurrent shared maintenance and re-use of sub-patterns across queries. Our experimental study confirms that the SPASS framework achieves over 16 fold performance improvement for a wide range of queries.

Online Aggregation of Stream Sequence Patterns

Complex Event Processing (CEP) is a technology of choice for high performance analytics in time-critical decision-making applications. Yet while effective technologies for complex pattern detection on continuous event streams have been developed, the problem of scalable online aggregation of such patterns has been overlooked. Instead, aggregation is typically applied as a post processing step after CEP pattern detection, leading to an extremely ineffective solution. In this paper, we demonstrate that CEP aggregation can be pushed into the sequence construction process. Based on this insight our A-Seq strategy successfully aggregates sequence pattern online without ever constructing sequence matches. This drives down the complexity of the CEP aggregation problem from polynomial to linear. We further extend our A-Seq strategy to support the shared processing of concurrent CEP aggregation queries. The A-Seq solution is shown to achieve over four orders of magnitude performance improvement for a wide range of tested scenarios compared to the state-of-the-art solution.

Active Complex Event Processing over Event Streams

State-of-the-art Complex Event Processing technology (CEP), while effective for pattern query execution, is limited in its capability of reacting to opportunities and risks detected by pattern queries. Especially reactions that affect the query results in turn have not been addressed in the literature. We propose to tackle these unsolved problems by embed- ding active rule support within the CEP engine, hence- forth called Active CEP (ACEP). Active rules in ACEP al- low us to specify a pattern query's dynamic condition and real-time actions. The technical challenge is to handle in- teractions between queries and reactions to queries in the high-volume stream execution. We hence introduce a novel stream-oriented transactional model along with a family of stream transaction scheduling algorithms that ensure the correctness of concurrent stream execution. We demonstrate the power of ACEP technology by applying it to the develop- ment of a healthcare system being deployed in UMass Med- ical School hospital. Through extensive performance exper- iments using real data streams, we show that our unique Low-Water-Mark stream transaction scheduler, customized for streaming environments, successfully achieves near-real- time system responsiveness and gives orders-of-magnitude better throughput than our alternative schedulers.

Sequence Pattern Query Processing over
Out-of-Order Event Streams

Complex event processing has become increasingly important in modern applications, ranging from RFID tracking for supply chain management to real-time intrusion detection. A key aspect of complex event processing is to extract patterns from event streams to make informed decisions in real-time. However, network latencies and machine failures may cause events to arrive out-of-order at the event processing engine. State-of-the-art event stream processing technology experiences significant challenges when faced with out-of-order data arrival including output blocking, huge system latencies, memory resource overflow, and incorrect result generation. To address these problems, we propose two alternate solutions: aggressive and conservative strategies respectively to process sequence pattern queries on out-of-order event streams. The aggressive strategy produces maximal output under the optimistic assumption that out-of-order event arrival is rare. In contrast, to tackle the unexpected occurrence of an out-of-order event and with it any premature erroneous result generation, appropriate error compensation methods are designed for the aggressive strategy. The conservative method works under the assumption that out-of-order data may be common, and thus produces output only when its correctness can be guaranteed. A partial order guarantee (POG) model is proposed under which such correctness can be guaranteed. For robustness under spiky workloads, both strategies are supplemented with persistent storage support and customized access policies.

Interval Event Stream Processing

Existing ESP engines have focused on detecting temporal patterns from instantaneous events, that is, events with no duration. However, such sequential patterns are inadequate to express the complex temporal relationships in domains such as medical, finance and meteorology, where the events' durations could play an important role. There are more temporal relations that can be defined between two interval events than two point events. The query semantics and evaluation mechanisms used for detecting temporal patterns over point event streams is not sufficient for temporal pattern detection over interval event streams. An expressive language to represent the required temporal patterns among streaming interval events is needed. Also, query evaluation mechanisms for such sequence queries need to be designed.