[Overview] [Features] [Members] [Publications] [Related Works] [Funding]
Overview
As XML becomes popular, more and more stream sources exist in the XML format. Typical XML stream applications include XML message brokers for B2B message-oriented middleware servers and selective dissemination of information such as personalized newspaper delivery. The general goal of raindrop is to tackle challenges of stream processing that are specific to XML, in particular, processing XQuery, a standard XML query language, over XML streams.
It is important to note that unlike tuple-based or object-based data streams, XML streams are more appropriately modeled as a sequence of primitive tokens, such as a start tag, an end tag or a PCDATA item. However, a token is not self-contained, compared to a tuple that is a self-contained structure whose semantics are completely determined by its own values. A token, on the other hand, lacks semantics without the context provided by other tokens in the stream. Structural pattern retrieval, one of three functionalities in an XQuery (the other two are filtering and restructuring), has to be first performed on these non-self-contained tokens to compose self-contained objects.
While the automata model is naturally suited for pattern matching on tokenized XML streams, the algebraic model in contrast is a well-established technique in database systems for set-oriented processing of self-contained data units, i.e., tuples. However, neither automata models nor algebraic models are well-equipped to handle both computation paradigms. The goal of the Raindrop project is now to accommodate these two paradigms within one uniform algebraic framework, thus taking advantage of both. In our query model, both tokenized data and self-contained tuples are supported in a uniform manner. Query plans in this way can be flexibly rewritten using algebra-like equivalence rules to change what computation is done using tokenized data versus tuples. Raindrop system has four levels of abstractions in its system framework, namely, semantics-focused plan, stream logical plan, stream physical plan and execution plan. Various optimization techniques are provided at each level.
Raindrop runs within the
CAPE system developed by
the DSRG at WPI Computer Science
Department. It is one instantiation of
Raindrop’s source code, demo and test cases can be downloaded on Raindrop1.0 Release.
Information for the proposed R-SOX system, based on the Raindrop engine, can be found on R-SOX system.
Last Updated 2006.8 by {suhong, minglee, samanwei} at cs.wpi.edu