WWW Page
http://davis.wpi.edu/dsrg/EVE
Project Award Information
Award Number: IIS-9988776; Duration: 9/20/2000 - 9/19/2003.
Title: Data Warehouse Maintenance over Dynamic Distributed Information Sources
Keywords
Derived Databases, Information Integration,
Data Warehousing, View Derivation and Synchronization, Schema
Evolution, Interoperability.
Project Summary
As digital repositories of information are springing up everywhere and
interconnectivity between computers around the world is being
established, the construction and maintenance of data warehouses
(views) over such distributed data sources has been recognized of
great importance for modern applications. However, data sources
continuously evolve by modifying not only their content but also their
query capabilities and schema and by joining or leaving the
environment. This research addresses this critical and timely problem
of how to rewrite view definitions synchronously with the schema
changes of data sources, coined the view synchronization problem. The
Evolvable View Environment (EVE) solution is based on a preference
model for view evolution and a meta model for capturing data source
interrelationships. Algorithms that exploit different types of meta
data and view evolution preferences to synchronize views are designed.
Measures for ranking alternative view rewritings based on their
quality are established. To assure survivability in any dynamic
environment, a strategy for coordinating view synchronization under
schema changes and the more traditional view maintenance under data
changes is incorporated. Implementation of the EVE software system
serves as a proof of concept and as an experimental test bed.
Experimental evaluations are conducted to assess performance,
applicability, and quality of the rewritings. A case study on
applying the technology to web and E-commerce applications helps to
determine the applicability as well as the limitation of the
technology. In summary, this project will advance not only the
state-of-the-art in data warehousing, a core area of database
technology, but the benefits are potentially far-reaching by providing
techniques and software tools for simplifying access to large sets of
dynamic distributed data sources.
Publications and Products
Selective recent publications include :
A prototype system called EVE
that has been the foundation for this project
had been demonstrated (ACM SIGMOD'99),
and a new prototype system called Dyda
will be presented at ACM SIGMOD'2001 (May 2001).
Its source code is expected to be released
at our website Summer 2001.
Project Impact
Impact on Human Resources.
This project has partially funded several Ph.D. students
in my database research groups: Andreas Koeller,
Xin Zhang, and Songting Chen.
Also, some Master
students and undergraduate students  have been involved,
such as Brian Murphy.
Impact on education and curriculum development at
all levels.
This project has increased education at the undergraduate
level by providing small projects into which we actively can involve undergraduate
students via REUs and directed study projects. It has also enhanced our
graduate courses, e.g, the Advanced Database course (CS561) as well
as the Special Topics course on Web Databases at WPI. (CS525)
Industry.
We have had several interactions
and exchanges of ideas related to
this project with others, most notably
with Dr. Arnon Rosenthal at Mitre Corporation.
Impact.
Sustainable integration of
data sources that survives even
evolution and migration of the data sources, including
their transformation of their schema,
are critical problems faced by software industry. Our
project promises to provide automated solutions to these goals.
Goals, Objectives, and Targeted Activities
Targeted Accomplishments
Project References
Area Background
Area References
Potential Related Projects. Derived Views, Database Query Languages.
Selected Accomplishments
The construction and maintenance of data warehouses
(views) over distributed data sources has been recognized of
great importance for modern applications. However,
such modern data sources
often are dynamic that is they modify not only their content but also their
query capabilities and schema and they join or leave the
environment. This research addresses this critical and timely problem
of how to rewrite view definitions synchronously with the schema
changes of data sources, coined the view synchronization problem.
EVE Framework.
The Evolvable View Environment (EVE) solution is based on a preference
model for view evolution and a meta model for capturing data source
interrelationships. Some initial
algorithms that exploit different types of meta
data and view evolution preferences to synchronize views
have been designed.
Measures
for ranking alternative view rewritings based on both their
cost for maintenenace and their
quality have been established established.
TnxWrap Wrapper Architecture.
To assure scalability and survivability in any dynamic
environment, a transaction-based
strategy for coordinating view synchronization under
schema changes and the more traditional view maintenance under data
changes has been proposed.
This transactional approach
uses the concept of a "DWMS_Transaction" to encapsulate
the complete data warehous e maintenance process. With the help of an
additional level of materialization in special-purpose source
wrappers, we propose a multiversion concurrency control strategy that
guarantees a consistent view of the information source space data
inside each DWMS_Transaction, thus removing the maintenance anomaly
problem. This integrated solution called "TxnWrap" now achieves at
least strong consistency of DW maintenance even under schema
changes. TxnWrap is complementary to previous maintenance algorithms
for DUs and SCs, because it removes concurrency consid erations from
these maintenance algorithms. Our approach also places little
cooperation assumptions on information sources. We have implemented
a first prototype of the
TxnWrap solution and succeeded to plug it into our
existing data warehousing testbed at WPI. Experiments to assess
its performance will need to be conducted in the coming year.
TnxWrap Optimization.
We plan to optimize the
TxnWrap solution towards dynamic data integration
in several ways. One, we plan to look at the reduction
of the storage space used by each information source wrapper,
since at the moment it stores all versions of tuples modified
at the information source. We will explore filtering of the
wrapper database both based on selection and projection conditions
as well as version-based clean up once a change has been committed
to the data warehouse.
Secondly, we also intend to develop a parallel scheduler
for handling the maintenance process upon
notification of a source update. This should significantly improve
the performance of the overall system, allowing us to better exploit
the computational resources available at each source
instead of sequentially pacing the
maintenance process.
Integration of Schematically Heterogeneous
Data Sources.
We plan to address the issue of integration of
sources that are semantically equivalent (i.e., whose states can be
mapped onto each other by an isomorphism) but schematically
heterogeneous. While two such data sources may capture the same
information, one database may model the information as tuples
(data) while the other may store it in attribute or relation
names (schema). After initial solutions involving ad-hoc
programs, declarative mechanisms for supporting such powerful source
restructuring have been devised recently, for example a SQL query
language extension called SchemaSQL.
We now want to explore the system integration of
such sources into our system via special SchemaSQL wrappers.
More importnatly, the problem of maintenance of such
restructuring views over semantically heterogeneous sources
once established must be explored.
We plan to develop
strategies for incremental
maintenance of such schema-restructuring views,
to implement them, assess their performance by comparative
studies, and also integrate the final wrappers into our
data warehousing system.
See above "Products" category for a listing of recent
project reports, software and demonstrations. Those
and related products can also be downloaded and/or viewed
from our Database Systems Research Group homepage:
http://davis.wpi.edu/dsrg
The area of this project is database views,
in particular as studied for relational database systems.
Views are named stored queries.
View mechanisms serve the purpose of database customization,
security and access rights, and information derivation
and integration.
One important issue for database views
is maintenance , that is,
the incremental modification of the database
view, if materialized, whenever the underlying
data source is being updated.
.