WPI Computer
Science Department

Computer Science
Department - Database Systems Research Lab (DSRG)
------------------------------------------

(last updated January 10, 2000)


EVE: The QC-Model

Overview

Project Members

Publications




Overview

Traditional problems in query rewriting include query optimization and rewriting queries using views. Most of these works deal with the problem of maintaining the exact original interface and extent of a given query while optimizing performance. They are thus based on the restricting assumption that the rewritten query must be equivalent to the initially given query. Recently, query rewriting with relaxed semantics has been proposed as a means of retaining the validity of a data warehouse (i.e., materialized queries) in situations where equivalent rewritings may not exist - yet alternate but not necessarily equivalent query rewritings may still be preferable to users over not receiving any answers at all. Other scenarios that also motivate a relaxation of the "exact query" assumption include loosely-specified query paradigms as well as market-oriented environments in which very similar (but not equal) results in answering a query can lead to dramatically different query computation costs.

Generating non-equivalent query results raises a new problem in the context of query rewriting. Since results returned for a given query may now be quite distinct, it leads to the problem of having to compare "incomparable" query results, or rewritings, for a given query. As one would expect the number of non-equivalent query rewritings is much larger than the number of equivalent query rewritings in general. Given the now even larger choice set than that for the equivalent query rewriting problem, an automated means of comparison of various rewritings is in pressing need. In this work, we report the development of a measurement model for non-equivalent rewritings. While the problem arises in many different environments in which queries are used, for the purpose of this work we focus our attention on E-SQL as the relaxed query model and on the issue of view maintenance in data warehouses as motivation for establishing the model.

In this work, we introduce the two dimensions of information preservation (quality) and view maintenance performance (cost) of query rewritings as two key components of the proposed model. The work addresses the need for measuring the divergence between queries in a quantifiable manner by proposing measures for the interface and extent divergence of the query results, referred to as the quality of the rewriting. Given that the independent dimensions of quality and cost cannot be easily evaluated, we analyze the semantics of these dimensions and propose a model of assigning numerical values and trade-off parameters in order to achieve a quantifiable overall evaluation for rewritings. The resulting model, which we call Quality-Cost-Model (QC-Model), combines these two dimensions into a single measure.

We successfully address several core issues of the problem including the definition of a distance between view extents, the estimation of the overlap of two view extents from the underlying information sources (ISs), and a methodology for the empirical establishment of a set of unit cost factors for the cost part of the model. We have developed a comprehensive test bed for the purpose of verifying and demonstrating our findings. In particular, we wanted to assess the predictability of the proposed QC-Model. Using our experimental setup, we evaluated the accuracy of our proposed view overlap estimation for the quality portion of the QC-Model as well as the predictability of actual view maintenance cost using the cost portion of the model which we adapted from the literature on incremental view maintenance cost. The experiments show a strong correlation between estimated and actual view extent overlaps as well as between predicted and actual incremental view maintenance cost.

Project Members

Publications

1999

Andreas Koeller and Elke A. Rundensteiner.
View Synchronization: Using an Integrated Approach of Rewriting and Ranking View Queries.
Computer Science Information Management Journal, Volume 2 Number 1, ???-??? (1999), Maximilian Press Publisher

A. Lee, A. Koeller, A. Nica and E. A. Rundensteiner. Non-Equivalent Query Rewritings, Proceedings of International Database Conference (IDC'99), Hong Kong, July, 1999.

Elke A. Rundensteiner, Andreas Koeller, Xin Zhang, Amy J. Lee, and Anisoara Nica
Evolvable View Environment (EVE): Non-Equivalent View Maintenance under Schema Changes,
Software system demonstration, Proceedings of SIGMOD'99, Philadelphia, USA, May 1999.

Amy J. Lee, Andreas Koeller, Anisoara Nica, and Elke A. Rundensteiner.
Data Warehouse Evolution: Trade-Offs between Quality and Cost of Query Rewritings
Poster Session, International Conference on Data Engineering (ICDE'99), Sydney, March 23-26, 1999.

1998

A. Koeller, E. A. Rundensteiner, and N. Hachem.
Integrating the Rewriting and Ranking Phases of View Synchronization

Technical Report WPI-CS-TR-98-23, Worcester Polytechnic Institute, Dept. of Computer Science, 1998.

A. Koeller, E. A. Rundensteiner, and N. Hachem.
"Integrating the Rewriting and Ranking Phases of View Synchronization"
ACM First International Workshop on Data Warehousing and OLAP Computer Science (DOLAP '98), Washington, D.C., Nov. 7, 1998.

A. J. Lee, A. Koeller, A. Nica, and E. A. Rundensteiner.
Data Warehouse Evolution: Trade-offs between Quality and Cost of Query Rewritings.
Technical Report WPI-CS-TR-98-2, Worcester Polytechnic Institute, Dept. of Computer Science, 1998.