EVE: The Evolvable View Environment
Materialized views are important for many application domains in large-scale environments composed of numerous heterogeneous and distributed information sources (ISs), such as the WWW. Such large-scale environments are plagued with continuously changing information supplied by ISs not only modifying their contents, but also their schemas, their interfaces as well as their query capabilities. As a consequence, materialized views may become undefined when the underlying data upon which the views are defined change their capabilities. Current view technology only supports static views, i.e., views are defined in a static environment. We call this problem the view evolution problem. To the best of our knowledge, our work is the first attempt of addressing the view evolution problem. We propose a general framework called the Evolvable View Environment (EVE) for addressing this problem. In the EVE framework, we add flexibility to the view evolution process by extending the view definition language SQL to include view evolution preferences, indicating for example which components (such as the view interface, the view extent, etc.) of the view are dispensable, essential, or replaceable by alternate components. Our goals here are to have the view remain useful (even if modified) and to preserve the view as much as possible under capability changes of ISs, instead of having the view simply become undefined. In order to preserve view components, our system locates replacements for affected components from alternate ISs. For this, we design a Model for Information Source Descriptions (MISD) that characterizes the capabilities of each IS as well as possible relationships between ISs, such as join constraints, partial containment constraints, etc. Equipped with the evolvable SQL (E-SQL) view definition language and the MISD, we have proposed strategies for the view synchronization process. Our algorithms evolve affected view definitions guided by constraints imposed by view evolution preferences as well as by knowledge of semantic relationships between information sources.
This project is in part funded by the National Science Foundation under grant Award Number: IIS-9988776 and the project title "Data Warehouse Maintenance over Dynamic Distributed Information Sources". This NSF project time period ranges from Oct/1/2000 - Sept/30/2003.The goal of the EVE information integration effort is to address the problem that autonomous information sources can change not only their data, but also their capabilities (schema) at any time. View synchronization in EVE refers to the rewriting of view definitions synchronously with the capability changes of ISs, with the purpose to always maintain a view definition that is valid on the current state of the dynamic information space.
Unlike previous research on query rewriting, we have proposed in EVE query rewriting with relaxed semantics as a means of retaining the validity of a data warehouse (i.e., materialized queries) in a changing environment. To achieve this, we have introduced E-SQL, allowing attributes in the query interface to be classified as essential or dispensable (if it cannot be retained) according to the query definer's preferences. Similarly, preferences for the query extent can be specified in E-SQL, for example, to indicate whether a subset of the original result is acceptable or not. A query rewriting is said to be acceptable if it preserves the essential information of the original query and satisfies the constraint on the view extent. We have provided initial solutions to the problem of view synchronization. We have also looked at the issue of incrementally maintaining view extents after such rewritings.
Since each rewriting may preserve the original query to a different degree, a potentially large number of acceptable yet non-equivalent query rewritings may be found.
Therefore, we need to systematically select the most promising rewriting out of all possible ones. This can be accomplished by assessing the two most important factors influencing the desirability of a query rewriting: the information preserved by the rewriting w.r.t. the original query result (quality ) and the cost of acquiring the query results (cost ). A rewriting is more desired than others if it is "semantically close" to the original one and can be acquired economically. We have investigated some issues related to this problem of assessing the quality and cost of view rewritings after capability changes.
Our view synchronization technology offers benefits to many information integration issues. In addition to being able to evolve views under capability changes of sources, we could also provide temporary view synchronization (providing possibly slightly different information to a user while the original data source is temporarily unavailable), assess the quality of data that is provided to a user (i.e., the divergence from the data originally desired by a user), and help users to extract the best information possible from a large pool of information.
In summary, to assure our proposed research rests on a solid
foundation, the following work has so far been done in the project:
This is a HTML-Version of the introductory EVE paper.
The EVE documentation is available online here.
A smaller online version EVE demo showing the basic functionality of view synchronization is available under: http://shiba.wpi.edu/eve_server/demo.htm. You need a Java-capable browser that is able to use a downloadable Swing-Library (Java 1.1.; all Netscape versions since 4.05 should work; Internet Explorer 4.0 may cause some problems) to run this demo.
Elke A. Rundensteiner.
NSF Annual Progress Report. 2001-2002
.
Elke A. Rundensteiner.
NSF Annual Progress Report. 2000-2001
.
Lingli Ding, Xin Zhang, Elke A. Rundensteiner.
Scalable Maintenance in Distributed Data Warehousing Environment (.pdf)
Technical Report WPI-CS-TR-00-16, Worcester Polytechnic Institute,
Dept. of Computer Science.
Elke A. Rundensteiner, Andreas Koeller, and Xin Zhang.
Maintaining Data Warehouses over Changing
Information Sources (.pdf)
Communications of the ACM. June 2000. Special Section on System
Integration.
Andreas Koeller, Elke A. Rundensteiner.
A History-Driven Approach at
Evolving Views Under Meta Data Changes (.pdf)
Technical Report
WPI-CS-TR-00-01, Worcester Polytechnic Institute, Dept. of Computer
Science.
Lingli Ding, Xin Zhang and Elke A. Rundensteiner.
The MRE Wrapper Approach : Enabling Incremental Veiw
Maintenance of Data Warehouses Defined On Multi-Relation Information Sources
DOLAP99.
Nov.1999
Andreas Koeller and Elke A. Rundensteiner.
View Synchronization: Using an Integrated Approach of Rewriting and
Ranking View Queries.
Computer Science Information Management Journal, Volume 2
Number 1, ???-??? (1999),
Maximilian Press Publisher
Xin Zhang, Data Warehouse Maintenance Under Interleaved Schema and Data Updates, MS Thesis, ps, pdf, Computer Science Department, Worcester Polytechnic Institute, Worcester MA, May 1999.
A. Nica, View Evolution Support for Information Integration Systems over Dynamic Distributed Information Spaces, Dissertation Electrical Engineering and Computer Science Department, University of Michigan, Ann Arbor, May 1999. Dissertation Abstract (postscript), Dissertation (postscript) and Dissertation Slides (postscript)
Xin Zhang and Elke A. Rundensteiner. Cooperative Information Sources and Data Warehouse Maintenance , International Database Engineering and Application Symposium (IDEAS'99) April, 1999
E. A. Rundensteiner and A. Koeller and X. Zhang and A. J. Lee and A.Nica. Evolvable View Environment (EVE): A Data Warehousing system Handling Both Schema and Data Changes of Distributed Sources , International Database Engineering and Application Symposium (IDEAS'99) April, 1999
X. Zhang, L. Ding and E. A. Rundensteiner.
PSWEEP: Parallel View Maintenance Under Concurrent Data Updates of Distributed Sources
,
Technical Report WPI-CS-TR-99-14, Worcester Polytechnic Institute,
Dept. of Computer Science, 1999.
A. Nica and E. A. Rundensteiner. View Maintenance after View Synchronization, Proceedings of IDEAS'99 Conference, Montreal, Canada, August, 1999.
A. Lee, A. Koeller, A. Nica and E. A. Rundensteiner. Non-Equivalent Query Rewritings, Proceedings of International Database Conference (IDC'99), Hong Kong, July, 1999.
Elke A. Rundensteiner, Andreas Koeller, Xin Zhang, Amy J. Lee, and
Anisoara Nica
Evolvable View Environment (EVE): Non-Equivalent View Maintenance
under Schema Changes,
Software system demonstration, Proceedings of SIGMOD'99,
Philadelphia, USA, May 1999.
Xin Zhang, and Elke A. Rundensteiner.
Data Warehouse Maintenance Under Concurrent Schema and Data Updates
Poster Session, International Conference on Data Engineering (ICDE'99), Sydney, March 23 - 26, 1999.
Amy J. Lee, Andreas Koeller, Anisoara Nica, and Elke A. Rundensteiner.
Data Warehouse Evolution: Trade-Offs between Quality and Cost
of Query Rewritings
Poster Session, International Conference on Data
Engineering (ICDE'99), Sydney, March 23-26, 1999.
A. Koeller, E. A. Rundensteiner, and N. Hachem.
Integrating the Rewriting and Ranking Phases of View
Synchronization
Technical Report WPI-CS-TR-98-23, Worcester Polytechnic Institute,
Dept. of Computer Science, 1998.
A. Koeller, E. A. Rundensteiner, and N. Hachem.
"Integrating the Rewriting and Ranking Phases of View
Synchronization"
ACM First International Workshop on Data
Warehousing and OLAP Computer Science (DOLAP '98), Washington, D.C.,
Nov. 7, 1998.
Claypool, K. T., Rundensteiner, E. A., Chen, L., and Kothari, B.,
``Re-usable ODMG-based Templates for Web View Generation and Restructuring'',
CIKM'98 Workshop on Web Information and Data Management (WIDM'98),
Washington, D.C., pp 53-56, Nov 1998.
Xin Zhang, and Elke A. Rundensteiner.
Data Warehouse Maintenance Under Concurrent Schema and Data Updates
Technical Report WPI-CS-TR-98-8, Worcester Polytechnic Institute,
Dept. of Computer Science, 1998.
A. Nica and E.A. Rundensteiner.
Using Containment Information for View Evolution in Dynamic Distributed
Environments.
Data Warehouse Design and OLAP Technology, DWDOT'98, (PC Chair:
Dr. Mukesh Mohania), Austria, Sept. 1998.
A. J. Lee and E. A. Rundensteiner.
Data Warehouse Evolution: Consistent Metadata Management
Session: ``Data Mining and Data Warehousing'', 1998 IEEE International
Conference on Systems, Man, and Cybernetics (SMC), San Diego, California,
October 1-14, 1998.
A. J. Lee, A. Koeller, A. Nica, and E. A. Rundensteiner.
Data Warehouse Evolution: Trade-offs between Quality and Cost of
Query Rewritings.
Technical Report WPI-CS-TR-98-2, Worcester Polytechnic Institute,
Dept. of Computer Science, 1998.
A. Nica, A. J. Lee, and E. A. Rundensteiner.
The CVS Algorithm for View Synchronization in Evolvable Large-Scale
Information Systems.
In Proceedings of International Conference on Extending
Database Technology (EDBT'98), pages 359-373, Valencia, Spain, March 1998.
A. Nica and E. A. Rundensteiner.
The POC and SPOC Algorithms: View Rewriting using Containment
Constraints in EVE.
Technical Report WPI-CS-TR-98-3, Worcester Polytechnic Institute,
Dept. of Computer Science, 1998.
A. Nica and E. A. Rundensteiner.
Using Complex Substitution Strategies for View Synchronization.
Technical Report WPI-CS-TR-98-4, Worcester Polytechnic Institute,
Dept. of Computer Science, 1998.
A. Nica and E. A. Rundensteiner.
View Maintenance after View Synchronization.
Technical Report WPI-CS-TR-98-?, Worcester Polytechnic Institute,
Dept. of Computer Science, 1998.
E. A. Rundensteiner, A. Koeller, A. Lee, Y. Li, A. Nica, and X. Zhang.
Evolvable View Environment (EVE) Project: Synchronizing Views over
Dynamic Distributed Information Sources.
In Demo Session Proceedings of International Conference on
Extending Database Technology (EDBT'98), pages 41-42, Valencia, Spain,
March 1998.
A. Nica and E. A. Rundensteiner.
"Loosely-Specified Query Processing in Large-Scale Information
Systems."
International Journal of Cooperative
Information Systems, Vol. 6, Nos. 3 and 4 (1997), pp. 241-268,
World Scientific Publishing Company.
A. J. Lee, A. Nica, and E. A. Rundensteiner.
The EVE Framework: View Synchronization in Evolving Environments.
Technical Report WPI-CS-TR-97-4, Worcester Polytechnic Institute,
Dept. of Computer Science, 1997.
A. Nica, A. J . Lee, and E. A. Rundensteiner.
The Complex Substitution Algorithm for View Synchronization.
Technical Report WPI-CS-TR-97-8, Worcester Polytechnic Institute,
Dept. of Computer Science, 1997.
E. A. Rundensteiner, A. Lee, A. Nica.
On Preserving Views in Evolving Environments.
Proceedings of 4th Int. Workshop on Knowledge Representation Meets Databases
(KRDB'97): Intelligent Access to Heterogeneous Information, Athens,
Greece, Aug 1997, pp. 13.1-13.11.
A. J. Lee and A. Nica and E. A. Rundensteiner".
Keeping Virtual Information Resources Up and Running.
Proceedings of IBM Centre for Advanced Studies Conference (CASCON97),
Best Paper Award, Nov. 1997.
created by: Andreas Koeller. maintained by: Xin Zhang. Last modified Jan. 10, 2000