|
Dynamic Data Warehouse (DyDa):
Data warehouses (DW) are an emerging technology to support
high-level decision making by gathering information from several
distributed information sources (ISs) into one repository. In dynamic
environments such as the web, DWs must be maintained in order to stay
up-to-date. Recently proposed view maintenance algorithms tackle this
problem of DW management under concurrent data updates (DU) at
different ISs, whereas the EVE system is the first to handle non-concurrent
schema changes (SC) of ISs. However, the concurrency of schema
changes by different ISs as well as the concurrency of both
interleaved schema changes (SC) and data updates (DU) still remain
unexplored problems.
In this paper, we propose a solution framework called DyDa that
successfully addresses both problems. The DyDa framework detects
concurrent SCs by the broken query scheme and conflicting concurrent
DUs by a local timestamp scheme. A fundamental idea of the DyDa
framework is the development of a two-layered architecture that
separates the concerns for concurrent DU and concurrent SC handling
without imposing any restrictions on the fully concurrent execution of
the ISs. At the lower (query engine) level of the framework, it
employs a local correction algorithm to handle concurrent DUs, and a
local name mapping strategy to handle concurrent rename-SCs that
rename either attributes or relations at the IS space. At the higher
(\dwcon) level, it addresses the problem of concurrent (drop-SC)
operations that drop attributes or relations from the IS space. For
the later, the view synchronization (VS) algorithm is modified to keep
track of view evolution information as needed for handling sequences
of concurrent SCs. We also design a new view adaption (VA) algorithm,
called Map-VA, that incrementally adapts the view extent for a
modified view definition even under interleaved SCs and DUs. Put
together, these algorithms provide a comprehensive solution to DW
management under concurrent SCs and DUs. This solution is currently
being implemented within the EVE data warehousing system.
|
|
Schema and Data Updates Concurrency Control System (SDCC):
Data warehouses (DW) are built by gathering information from
several information sources and integrating it into one repository
customized to users' needs. Recently proposed view maintenance
algorithms tackle the problem of (concurrent) data updates happening
at different autonomous ISs, whereas the EVE system addresses the
maintenance of a data warehouse after schema changes of ISs. The
concurrency of schema changes and data updates performed by different
ISs still remains an unexplored problem however. This paper now
provides a solution to this problem that guarantees the concurrent
view definition evolution and view extent maintenance of a DW defined
over distributed ISs. To solve that problem, we introduce a framework
called SDCC (Schema change and Data update Concurrency Control)
system. SDCC integrates existing algorithms designed to address view
maintenance subproblems, such as view extent maintenance after IS data
updates, view definition evolution after IS schema changes, and view
extent adaptation after view definition changes, into one system by
providing protocols that enable them to correctly co-exist and
collaborate. SDCC tracks any potential faulty updates of the DW caused
by conflicting concurrent IS changes using a global message labeling
scheme. An algorithm that is able to compensate for such conflicting
updates by a local correction strategy, called Local Compensation
(LC), is incorporated into SDCC. The correctness of LC is proven.
Lastly, the overhead of the SDCC solution beyond the costs of the
known view maintenance algorithms it incorporates is shown to be
neglectable.
|
|
Parallel SWEEP (PSWEEP):
Data warehouses (DW) are built by gathering information from
several information sources (ISs) and integrating it into one
repository customized to users' needs. Recent work has begun to
address the problem of view maintenance of DWs under concurrent data
updates of different ISs. SWEEP proposed by Agrawal et al. is
one of the most popular solutions, even though its performance is
limited due to enforcing a sequential ordering on the handling of data
updates from ISs by the view maintenance module. To overcome this
limitation, we have developed a parallel algorithm for view
maintenance, called PSWEEP, that still incorporates all benefits of
SWEEP while offering substantially improved performance. In order to
perform parallel view maintenance, we have solved two issues:
detecting maintenance- concurrent data updates in a parallel
mode, and correcting the problem that the DW commit order may not
correspond to the DW update processing order due to parallel
maintenance handling. By decomposing SWEEP into an architecture of
modular components, we can insert a local timestamp assignment module
for detecting maintenance- concurrent data updates without
requiring any global clock synchronization. We introduce the negative
counter concept as a simple yet sufficient solution to solve the Variant-DW-Comment
problem of variant orders of committing effects of data updates to
the DW. We have proven the correctness of PSWEEP to guarantee that our
strategy indeed generates the correct final DW state. An evaluation of
both SWEEP and PSWEEP using an analytic cost model is given that shows
that PSWEEP has the potential of several orders of magnitude
performance improvement over SWEEP depending on the number of threads
supportable in the given DW system and the number and query power of
the information sources. Furthermore, both SWEEP and PSWEEP have been
implemented in our EVE data warehousing system. We have conducted an
extensive performance study, and the results of this study confirm the
correctness of our cost model and demonstrate the significant
performance improvement achieved by PSWEEP over SWEEP.
|
|
Multiple Relation Encapsulation Wrapper (MRE):
Data warehouses (DW) are built by gathering information from
several information sources (ISs) and integrating and materializing it
into one repository customized to user's needs. Some of the most
recently proposed algorithms for the incremental maintenance of such
materialized DWs, such as SWEEP and PSWEEP, offer several significant
advantages over previous solutions, such as high-performance, no
potential for infinite waits and reduced remote queries and thus
reduced network and IS loads. However, similar to many other
algorithms, they still have the restricting assumption that each IS
can be composed of just one single relation. This is unrealistic in
practice. In this paper, we hence propose a solution to overcome this
restriction. The Multi-Relation Encapsulation (MRE) Wrapper supports
multiple relations in information sources in a manner transparent to
the rest of the environment. The Multi-Relation Encapsulation Wrapper
treats one IS composed of multiple relations as if it were a single
relation from the DW point of view; thus it can easily be plugged into
existing incremental view maintenance algorithms without any change.
Hence, our method maintains all the advantages offered by existing
algorithms in the literature in particular SWEEP and PSWEEP, while
also achieving the additional desired features of being non-intrusive,
efficient, flexible and well-behaved.
|
|
SERF:
|
|
Evolvable View Environment (EVE):
Materialized views are important for many application domains in
large-scale environments composed of numerous heterogeneous and
distributed information sources (ISs), such as the WWW. Such
large-scale environments are plagued with continuously changing
information supplied by ISs not only modifying their contents, but
also their schemas, their interfaces as well as their query
capabilities. As a consequence, materialized views may become
undefined when the underlying data upon which the views are defined
change their capabilities. Current view technology only supports
static views, i.e., views are defined in a static environment. We call
this problem the view evolution problem. To the best of our
knowledge, our work is the first attempt of addressing the view
evolution problem. We propose a general framework called the Evolvable
View Environment (EVE) for addressing this problem. In the EVE
framework, we add flexibility to the view evolution process by
extending the view definition language SQL to include view evolution
preferences, indicating for example which components (such as the view
interface, the view extent, etc.) of the view are dispensable,
essential, or replaceable by alternate components. Our goals here are
to have the view remain useful (even if modified) and to preserve the
view as much as possible under capability changes of ISs, instead of
having the view simply become undefined. In order to preserve view
components, our system locates replacements for affected components
from alternate ISs. For this, we design a Model for Information Source
Descriptions (MISD) that characterizes the capabilities of each IS as
well as possible relationships between ISs, such as join constraints,
partial containment constraints, etc. Equipped with the evolvable SQL
(E-SQL) view definition language and the MISD, we have proposed
strategies for the view synchronization process. Our algorithms evolve
affected view definitions guided by constraints imposed by view
evolution preferences as well as by knowledge of semantic
relationships between information sources.
|
|