DyDa: Distributed Data Warehouse Maintenance Overview

Materialized views are often derived from several base relations stored over possibly distributed data sources, such as in a data warehouse or a web information integration system. These materialized views must be maintained under source changes. In such a distributed context, maintenance errors could occur due to concurrent source updates. To address these errors, state-of-the-art maintenance strategies typically issue maintenance queries to the data sources and then apply compensating queries to correct any errors in the delta refreshs. However, these existing solutions are limited to handling pure data updates only, making the restricting assumptions that (1) all source schemata remain stable over time, and thus (2) neither maintenance nor compensation queries are broken by source changes.

In this project, we lift these restrictions by proposing two alternative solutions that can handle both concurrent data and schema changes. The first solution is compensation-based, called Dyno. That is, we allow the concurrency occurs then we detect and correct the error results. The second solution is multiversion-based, called TxnWrap, namely, we apply multi-version techniques to avoid the occurrence of the concurrency. We also explore several optimization strategies for data warehouse maintenance. In particular, we propose techniques to handle parallel and batch data warehouse maintenance.

DyDa is an extension of one of our early work EVE, which considers the schema evolution of the data warehouse. In this work, we aim at solving the concurrency problems introduced by such evolution.