Projects
Home Up Projects References Publications

 

My Projects

This page has the projects I am working on and history projects.

Project Ideas:

bulletInvestigate XHTML 1.0.
bulletUse XML Repository to manage our Bibliographies.

Current Projects:

bullet

Dynamic Data Warehouse (DyDa):

Data warehouses (DW) are an emerging technology to support high-level decision making by gathering information from several distributed information sources (ISs) into one repository. In dynamic environments such as the web, DWs must be maintained in order to stay up-to-date. Recently proposed view maintenance algorithms tackle this problem of DW management under concurrent data updates (DU) at different ISs, whereas the EVE system is the first to handle non-concurrent schema changes (SC) of ISs. However, the concurrency of schema changes by different ISs as well as the concurrency of both interleaved schema changes (SC) and data updates (DU) still remain unexplored problems.

In this paper, we propose a solution framework called DyDa that successfully addresses both problems. The DyDa framework detects concurrent SCs by the broken query scheme and conflicting concurrent DUs by a local timestamp scheme. A fundamental idea of the DyDa framework is the development of a two-layered architecture that separates the concerns for concurrent DU and concurrent SC handling without imposing any restrictions on the fully concurrent execution of the ISs. At the lower (query engine) level of the framework, it employs a local correction algorithm to handle concurrent DUs, and a local name mapping strategy to handle concurrent rename-SCs that rename either attributes or relations at the IS space. At the higher (\dwcon) level, it addresses the problem of concurrent (drop-SC) operations that drop attributes or relations from the IS space. For the later, the view synchronization (VS) algorithm is modified to keep track of view evolution information as needed for handling sequences of concurrent SCs. We also design a new view adaption (VA) algorithm, called Map-VA, that incrementally adapts the view extent for a modified view definition even under interleaved SCs and DUs. Put together, these algorithms provide a comprehensive solution to DW management under concurrent SCs and DUs. This solution is currently being implemented within the EVE data warehousing system.

 

bullet

Schema and Data Updates Concurrency Control System (SDCC):

Data warehouses (DW) are built by gathering information from several information sources and integrating it into one repository customized to users' needs.  Recently proposed view maintenance algorithms tackle the problem of (concurrent) data updates happening at different autonomous ISs, whereas the EVE system addresses the maintenance of a data warehouse after schema changes of ISs. The concurrency of schema changes and data updates performed by different ISs still remains an unexplored problem however. This paper now provides a solution to this problem that guarantees the concurrent view definition evolution and view extent maintenance of a DW defined over distributed ISs. To solve that problem, we introduce a framework called SDCC (Schema change and Data update Concurrency Control) system. SDCC integrates existing algorithms designed to address view maintenance subproblems, such as view extent maintenance after IS data updates, view definition evolution after IS schema changes, and view extent adaptation after view definition changes, into one system by providing protocols that enable them to correctly co-exist and collaborate. SDCC tracks any potential faulty updates of the DW caused by conflicting concurrent IS changes using a global message labeling scheme. An algorithm that is able to compensate for such conflicting updates by a local correction strategy, called Local Compensation (LC), is incorporated into SDCC. The correctness of LC is proven. Lastly, the overhead of the SDCC solution beyond the costs of the known view maintenance algorithms it incorporates is shown to be neglectable.

 

bullet

Parallel SWEEP (PSWEEP):

Data warehouses (DW) are built by gathering information from several information sources (ISs) and integrating it into one repository customized to users' needs. Recent work has begun to address the problem of view maintenance of DWs under concurrent data updates of different ISs. SWEEP proposed by Agrawal et al.  is one of the most popular solutions, even though its performance is limited due to enforcing a sequential ordering on the handling of data updates from ISs by the view maintenance module. To overcome this limitation, we have developed a parallel algorithm for view maintenance, called PSWEEP, that still incorporates all benefits of SWEEP while offering substantially improved performance. In order to perform parallel view maintenance, we have solved two issues: detecting maintenance- concurrent data updates in a parallel mode, and correcting the problem that the DW commit order may not correspond to the DW update processing order due to parallel maintenance handling. By decomposing SWEEP into an architecture of modular components, we can insert a local timestamp assignment module for detecting maintenance- concurrent data updates without requiring any global clock synchronization. We introduce the negative counter concept as a simple yet sufficient solution to solve the Variant-DW-Comment problem of variant orders of committing effects of data updates to the DW. We have proven the correctness of PSWEEP to guarantee that our strategy indeed generates the correct final DW state. An evaluation of both SWEEP and PSWEEP using an analytic cost model is given that shows that PSWEEP has the potential of several orders of magnitude performance improvement over SWEEP depending on the number of threads supportable in the given DW system and the number and query power of the information sources. Furthermore, both SWEEP and PSWEEP have been implemented in our EVE data warehousing system. We have conducted an extensive performance study, and the results of this study confirm the correctness of our cost model and demonstrate the significant performance improvement achieved by PSWEEP over SWEEP.

 

bullet

Multiple Relation Encapsulation Wrapper (MRE):

Data warehouses (DW) are built by gathering information from several information sources (ISs) and integrating and materializing it into one repository customized to user's needs. Some of the most recently proposed algorithms for the incremental maintenance of such materialized DWs, such as SWEEP and PSWEEP, offer several significant advantages over previous solutions, such as high-performance, no potential for infinite waits and reduced remote queries and thus reduced network and IS loads. However, similar to many other algorithms, they still have the restricting assumption that each IS can be composed of just one single relation. This is unrealistic in practice. In this paper, we hence propose a solution to overcome this restriction. The Multi-Relation Encapsulation (MRE) Wrapper supports multiple relations in information sources in a manner transparent to the rest of the environment. The Multi-Relation Encapsulation Wrapper treats one IS composed of multiple relations as if it were a single relation from the DW point of view; thus it can easily be plugged into existing incremental view maintenance algorithms without any change. Hence, our method maintains all the advantages offered by existing algorithms in the literature in particular SWEEP and PSWEEP, while also achieving the additional desired features of being non-intrusive, efficient, flexible and well-behaved.

 

bullet

SERF:

 

bullet

Evolvable View Environment (EVE):

Materialized views are important for many application domains in large-scale environments composed of numerous heterogeneous and distributed information sources (ISs), such as the WWW. Such large-scale environments are plagued with continuously changing information supplied by ISs not only modifying their contents, but also their schemas, their interfaces as well as their query capabilities. As a consequence, materialized views may become undefined when the underlying data upon which the views are defined change their capabilities. Current view technology only supports static views, i.e., views are defined in a static environment. We call this problem the view evolution problem. To the best of our knowledge, our work is the first attempt of addressing the view evolution problem. We propose a general framework called the Evolvable View Environment (EVE) for addressing this problem. In the EVE framework, we add flexibility to the view evolution process by extending the view definition language SQL to include view evolution preferences, indicating for example which components (such as the view interface, the view extent, etc.) of the view are dispensable, essential, or replaceable by alternate components. Our goals here are to have the view remain useful (even if modified) and to preserve the view as much as possible under capability changes of ISs, instead of having the view simply become undefined. In order to preserve view components, our system locates replacements for affected components from alternate ISs. For this, we design a Model for Information Source Descriptions (MISD) that characterizes the capabilities of each IS as well as possible relationships between ISs, such as join constraints, partial containment constraints, etc. Equipped with the evolvable SQL (E-SQL) view definition language and the MISD, we have proposed strategies for the view synchronization process. Our algorithms evolve affected view definitions guided by constraints imposed by view evolution preferences as well as by knowledge of semantic relationships between information sources.