WPI Computer
Science Department

Computer Science
Department - Database Systems Research Lab (DSRG)
------------------------------------------

Incremental Maintenance of Schema-Restructuring Views




Project Overview

Data integration, especially from heterogeneous data sources, is a hard and widely studied problem. One issue is the integration of sources that are semantically equivalent (i.e., whose states can be mapped onto each other by an isomorphism) but schematically heterogeneous. While two such data sources may capture the same information, one database may model the information as tuples (data) while the other may store it in attribute or relation names (schema). After initial solutions involving ad-hoc programs, declarative mechanisms for supporting such powerful source restructuring have been devised recently, for example an SQL query language extension called SchemaSQL. This work opens a new avenue for system integration. However, the issue of maintenance of such restructuring views over schematically heterogeneous sources remains an open problem. In this project, we developed the first strategy for incremental maintenance of such schema-restructuring views. Our algorithm transforms an update to a data source into an incremental update to be applied to the view, by propagating updates through the operators of a SchemaSQL algebra tree. We observe that the above class of query languages requires transformation of data into schema changes and vice versa. Hence, our update propagation strategy is designed to handle any combination of data updates or schema changes as input and to produce a correct sequence of data, schema, or a mixture of both update requests as output. We prove the correctness of the strategy. We also describe a prototype implementation and report on our experiments comparing the performance of incremental restructuring-view maintenance to view recomputation. We find that, in analogy to standard SQL view maintenance, incremental view maintenance in SchemaSQL is significantly faster than recomputation in many cases.

Description

In the context of schema-restructuring views, there are several new issues that we must address. First, it is not sufficient to consider data updates (DUs) for SchemaSQL, but also schema changes (SCs). Also, SchemaSQL views can transform schema into data and vice-versa, thus requiring a framework that can propagate data updates and/or schema changes in data updates and/or schema changes. As we show, using the standard approach of generating query expressions that compute some kind of ``delta'' relation between the old and the new view after an update is not possible for SchemaSQL, since the schema of this delta relation would not be defined. Our approach uses the algebra for SchemaSQL proposed by (Lakshmanan et al. 1996) and augments each SchemaSQL operator by incremental view maintenance capabilities. An architecture of the approach is shown below:

Incremental VM for
SchemaSQL -- Architecture

Research Issues

Project Members

Sponsors

This project is in part funded by the National Science Foundation under grant Award Number: IIS-9988776 and the project title "Data Warehouse Maintenance over Dynamic Distributed Information Sources". This NSF project time period ranges from Oct/1/2000 - Sept/30/2003.

Source Code

The source code (in Java) is available here as a tarball (gzipped tar-file). If you want to make use of it, please let us know as it may be difficult to get this code to work on another machine. The system requirements are: Java 1.1, Java TreeBuilder (JTB), JavaCC, JDBC (with the appropriate database class library), Oracle or another SQL-database with a similar SQL dialect. Documentation is mainly restricted to Javadoc-comments in the code which can be translated into html using javadoc. The shell-scripts for compilation are for Linux bash.

Publications

Koeller and Rundensteiner, 2001
Koeller, A. and Rundensteiner, E. A. (2001).
Incremental Maintenance of Schema-Restructuring Views in SchemaSQL.
Technical Report WPI-CS-TR-00-25, Worcester Polytechnic Institute, Dept. of Computer Science.
http://www.cs.wpi.edu/Resources/techreports.

Related Work

Griffin and Libkin, 1995
Griffin, T. and Libkin, L. (1995).
Incremental Maintenance of Views with Duplicates.
In Proceedings of SIGMOD, pages 328-339.

Lakshmanan et al., 1996
Lakshmanan, L. V. S., Sadri, F., and Subramanian, I. N. (1996).
SchemaSQL -- A Language for Interoperability in Relational Multi-database Systems.
In Vijayaraman, T. M. et al., editors, International Conference on Very Large Data Bases, pages 239-250, Mumbai, India.

Lakshmanan et al., 1999
Lakshmanan, L. V. S., Sadri, F., and Subramanian, S. N. (1999).
On Efficiently Implementing SchemaSQL on an SQL Database System.
In International Conference on Very Large Data Bases, pages 471-482.


[Return to the WPI Homepage] [Return to the CS Homepage] [Return to the DSRG Homepage]

created by: Andreas Koeller. Last modified Oct. 01, 2001