Materialized views are often derived from several base relations stored
distributed data sources, such as in a data warehouse or
a web information integration system. These materialized views must be
under source changes. In such a distributed context,
maintenance errors could occur due to concurrent source updates.
To address these errors, state-of-the-art maintenance strategies typically
issue maintenance queries to the data sources and then apply compensating
correct any errors in the delta refreshs. However, these existing
solutions are limited to handling pure data updates only, making the
restricting assumptions that (1) all source schemata remain
stable over time, and thus (2) neither maintenance nor compensation
queries are broken by source changes.
In this project, we lift these restrictions by proposing
two alternative solutions that can handle both concurrent data and schema
changes. The first solution is compensation-based, called Dyno.
That is, we allow the concurrency occurs then we detect and correct the
error results. The second solution is multiversion-based, called
TxnWrap, namely, we apply multi-version techniques to avoid the
occurrence of the concurrency.
We also explore several optimization strategies for data warehouse
maintenance. In particular, we propose techniques to handle
parallel and batch data warehouse maintenance.
DyDa is an extension of one of our early work EVE, which
considers the schema evolution of the data warehouse. In this work, we aim
at solving the concurrency problems introduced by such evolution.