Raindrop: Scalable, Adaptive XQuery Subscription System over XML Streams

RAINDROP: A XQuery Processing Engine for XML Stream Data

[Overview] [Features] [Members] [Publications] [Related Works] [Funding]

Features

Algebra Meets Automaton

To accommodate the two paradigms, we choose to extend the algebraic model to accommodate automata. The intuition is that algebra provides multiple description levels which allows us to present the details of automata computation at the lower level while presenting the semantics of automata computation at the higher level. At the abstract level, subplans modeling automata compuation are seamlessly integrated with the tuple-based algebraic plans, providing a uniform view of all computation. Our experimentations illustrate that our framework opens up more optimization opportunities beyond the existing solutions in the literature.

Different Automata Implementation

Different automatas could be used on the different purpose like document validation/well-formed checking/query processing. But some principle inside these three types are shared. For the automata used on the XQuery query processing, pure FSM (without any additional stacks) could be used only based on some very strict assumptions (document depth is known and well-formed is kept). So far people all are saying about using FSM but they are using an additional stack for keeping track of state information as well as the content information of the document. So, like using an PDA actually, but slightly different. Maybe the pure FSM would have many states, but it could return the result about whether a document matching a query set or not quickly, by getting rid of dealing with the text content of the document as well as the stack operations at the first stage. So, as our next step, we would try to use a pure FSM to deal with XQuery processing. Also we will try to find out the a better automata solution for XQuery processing.

Schema-based Optimization

Raindrop uses schema-based optimization, i.e., use schema information to rewrite a query into a more efficient one. In particular, the query system focus on the techniques that are rather relevant in the XML stream context. Previous work usually covers limited and simplistic cases. Raindrop instead develops a comprehensive solution that handles a more powerful query language and more schema constraints.

We explore a lightweight approach to utilize the most common XML schema knowledge for optimization. We support a schema-based optimization on the stream logical plan level. We observed a set of scenarios where schema information may help optimize the query plan, generalized from these scenarios plan rewriting rules, and developed an efficient algorithm for maximally applying these rules on the original stream logical plan.

Adaptive Run-time Plan Rewriting

Adaptive run-time plan rewriting is to be tackled. In the stream context, a running plan may become sub-optimal as the environment changes. Among numerous adaptive aspects, we will focus on the plan rewriting mainly consisting of operator replacing or operator swapping. We propose to design and evaluate two possible solutions, namely, cost-based and learning-based. Their pros and cons are analyzed.

Multiple Execution Modes

We propose to support multiple execution modes. In most data query engines, only one execution mode, such as data-driven or demand-driven, is used for controlling the execution order of operators (i.e., scheduling). However we observed adopting multiple execution modes can more fit the nature of different operators. We proposed a list of execution modes as well as the rules of synchronizing these modes in a both correct and optimized execution.

Raindrop Coming!

Last Updated 2006.8 by {suhong, minglee, samanwei} at cs.wpi.edu