EVE: The Evolvable View Environment
Materialized views are important for many application domains in large-scale environments composed of numerous heterogeneous and distributed information sources (ISs), such as the WWW. Such large-scale environments are plagued with continuously changing information supplied by ISs not only modifying their contents, but also their schemas, their interfaces as well as their query capabilities. As a consequence, materialized views may become undefined when the underlying data upon which the views are defined change their capabilities. Current view technology only supports static views, i.e., views are defined in a static environment. We call this problem the view evolution problem. To the best of our knowledge, our work is the first attempt of addressing the view evolution problem. We propose a general framework called the Evolvable View Environment (EVE) for addressing this problem. In the EVE framework, we add flexibility to the view evolution process by extending the view definition language SQL to include view evolution preferences, indicating for example which components (such as the view interface, the view extent, etc.) of the view are dispensable, essential, or replaceable by alternate components. Our goals here are to have the view remain useful (even if modified) and to preserve the view as much as possible under capability changes of ISs, instead of having the view simply become undefined. In order to preserve view components, our system locates replacements for affected components from alternate ISs. For this, we design a Model for Information Source Descriptions (MISD) that characterizes the capabilities of each IS as well as possible relationships between ISs, such as join constraints, partial containment constraints, etc. Equipped with the evolvable SQL (E-SQL) view definition language and the MISD, we have proposed strategies for the view synchronization process. Our algorithms evolve affected view definitions guided by constraints imposed by view evolution preferences as well as by knowledge of semantic relationships between information sources.
This project is in part funded by the National Science Foundation under grant Award Number: IIS-9988776 and the project title "Data Warehouse Maintenance over Dynamic Distributed Information Sources". This NSF project time period ranges from Oct/1/2000 - Sept/30/2003.The goal of the EVE information integration effort is to address the problem that autonomous information sources can change not only their data, but also their capabilities (schema) at any time. View synchronization in EVE refers to the rewriting of view definitions synchronously with the capability changes of ISs, with the purpose to always maintain a view definition that is valid on the current state of the dynamic information space.
Unlike previous research on query rewriting, we have proposed in EVE query rewriting with relaxed semantics as a means of retaining the validity of a data warehouse (i.e., materialized queries) in a changing environment. To achieve this, we have introduced E-SQL, allowing attributes in the query interface to be classified as essential or dispensable (if it cannot be retained) according to the query definer's preferences. Similarly, preferences for the query extent can be specified in E-SQL, for example, to indicate whether a subset of the original result is acceptable or not. A query rewriting is said to be acceptable if it preserves the essential information of the original query and satisfies the constraint on the view extent. We have provided initial solutions to the problem of view synchronization. We have also looked at the issue of incrementally maintaining view extents after such rewritings.
Since each rewriting may preserve the original query to a different degree, a potentially large number of acceptable yet non-equivalent query rewritings may be found.
Therefore, we need to systematically select the most promising rewriting out of all possible ones. This can be accomplished by assessing the two most important factors influencing the desirability of a query rewriting: the information preserved by the rewriting w.r.t. the original query result (quality ) and the cost of acquiring the query results (cost ). A rewriting is more desired than others if it is "semantically close" to the original one and can be acquired economically. We have investigated some issues related to this problem of assessing the quality and cost of view rewritings after capability changes.
Our view synchronization technology offers benefits to many information integration issues. In addition to being able to evolve views under capability changes of sources, we could also provide temporary view synchronization (providing possibly slightly different information to a user while the original data source is temporarily unavailable), assess the quality of data that is provided to a user (i.e., the divergence from the data originally desired by a user), and help users to extract the best information possible from a large pool of information.
In summary, to assure our proposed research rests on a solid
foundation, the following work has so far been done in the project:
This is a HTML-Version of the introductory EVE paper.
The EVE documentation is available online here.
A smaller online version EVE demo showing the basic functionality of view synchronization is available under: http://shiba.wpi.edu/eve_server/demo.htm. You need a Java-capable browser that is able to use a downloadable Swing-Library (Java 1.1.; all Netscape versions since 4.05 should work; Internet Explorer 4.0 may cause some problems) to run this demo.
