Incremental Maintenance of Schema-Restructuring Views
Data integration, especially from heterogeneous data sources, is a
hard and widely studied problem. One issue is the integration of
sources that are semantically equivalent (i.e., whose states can be
mapped onto each other by an isomorphism) but schematically
heterogeneous. While two such data sources may capture the same
information, one database may model the information as tuples
(data) while the other may store it in attribute or relation
names (schema). After initial solutions involving ad-hoc
programs, declarative mechanisms for supporting such powerful source
restructuring have been devised recently, for example an SQL query
language extension called SchemaSQL. This work opens a new
avenue for system integration. However, the issue of maintenance of
such restructuring views over schematically heterogeneous sources
remains an open problem.
In this project, we developed the first strategy for incremental
maintenance of such schema-restructuring views. Our algorithm
transforms an update to a data source into an incremental update to be
applied to the view, by propagating updates through the operators of a
SchemaSQL algebra tree. We observe that the above class of
query languages requires transformation of data into schema changes
and vice versa. Hence, our update propagation strategy is designed to
handle any combination of data updates or schema changes as input and
to produce a correct sequence of data, schema, or a mixture of both
update requests as output. We prove the correctness of the
strategy. We also describe a prototype implementation and report on
our experiments comparing the performance of incremental
restructuring-view maintenance to view recomputation. We find that,
in analogy to standard SQL view maintenance, incremental view
maintenance in SchemaSQL is significantly faster than
recomputation in many cases.
In the context of schema-restructuring views, there are several new
issues that we must address. First, it is not sufficient to consider
data updates (DUs) for SchemaSQL, but also schema changes
(SCs). Also, SchemaSQL views can transform schema into data and
vice-versa, thus requiring a framework that can propagate data updates
and/or schema changes in data updates and/or schema changes. As we
show, using the standard approach of generating query expressions that
compute some kind of ``delta'' relation between the old and the new
view after an update is not possible for SchemaSQL, since the
schema of this delta relation would not be defined.
Our approach uses the algebra for SchemaSQL proposed by
(Lakshmanan et al. 1996) and augments each SchemaSQL operator by
incremental view maintenance capabilities. An architecture of the
approach is shown below:
We give an
algebra-based solution to the problem of incremental view maintenance
of schema-restructuring views defined in SchemaSQL.
We prove this
approach correct by a method similar to the equational reasoning given
in (Griffin and Libkin 1995).
We present a prototype implementation of a
query processor for a subset of SchemaSQL and an incremental view
maintenance system in Java over JDBC-capable databases.
- We describe experiments we have conducted on our implementation to gain
insights into the performance of our algorithm.
This project is in part funded by the National Science Foundation
under grant Award Number: IIS-9988776 and the project title "Data
Warehouse Maintenance over Dynamic Distributed Information Sources".
This NSF project time period ranges from Oct/1/2000 - Sept/30/2003.
The source code (in Java) is available here as a tarball (gzipped tar-file). If you want
to make use of it, please let us
know as it may be difficult to get this code to work on another
machine. The system requirements are: Java 1.1, Java TreeBuilder
(JTB), JavaCC, JDBC (with the appropriate database class library),
Oracle or another SQL-database with a similar SQL
dialect. Documentation is mainly restricted to Javadoc-comments in the
code which can be translated into html using javadoc. The
shell-scripts for compilation are for Linux bash.
- Koeller and Rundensteiner, 2001
Koeller, A. and Rundensteiner, E. A. (2001).
Incremental Maintenance of Schema-Restructuring Views in SchemaSQL.
Technical Report WPI-CS-TR-00-25, Worcester Polytechnic Institute,
Dept. of Computer Science.
- Griffin and Libkin, 1995
Griffin, T. and Libkin, L. (1995).
Incremental Maintenance of Views with Duplicates.
In Proceedings of SIGMOD, pages 328-339.
- Lakshmanan et al., 1996
Lakshmanan, L. V. S., Sadri, F., and Subramanian, I. N. (1996).
SchemaSQL -- A Language for Interoperability in Relational
In Vijayaraman, T. M. et al., editors, International Conference
on Very Large Data Bases, pages 239-250, Mumbai, India.
- Lakshmanan et al., 1999
Lakshmanan, L. V. S., Sadri, F., and Subramanian, S. N. (1999).
On Efficiently Implementing SchemaSQL on an SQL Database System.
In International Conference on Very Large Data Bases, pages
created by: Andreas Koeller. Last modified
Oct. 01, 2001