(last updated Sep 11, 2001)
Xtra: Automating the Reconciliation of XML Documents
Xtra Overview
The advent of web services that use XML-based message exchange has
spurred many efforts that address issues related to inter-enterprise
service electronic commerce interactions. Currently emerging standards
and technologies enable enterprise to describe and advertise their own
Web Services and to discover and determine how to interact with
services fronted by other businesses. However, these technologies do
not address the problem of how to reconcile structural differences
between similar types of documents supported by different enterprises.
Transformations between such documents must thus be created manually
on a case-by-case basis. In this paper, we explore the problem of how
to automate the transformation of XML E-business documents. We develop
an integrated solution that automates as much as possible all steps of
the document transformation process. One, we propose a set of schema
transformation operations that establish semantic relationships
between two XML document schemas. Two, we define a model that allows
us to compare the cost of performing these operations. Three, we
introduce an algorithm that discovers an efficient sequence of
operations for transforming a source document schema into a target
document schema based on our cost model. The operation sequence then
is used to generate an equivalent XSLT transformation
script. Experimental results indicate that our algorithm can
satisfactorily discover acceptable
transformations.
Research Issues
Issue 1 -- Discovery of the semantic relationship between two XML schemas.
Issue 2 -- Translation of XML documents in the source format into target format
Our Approach
Since DTDs are currently the dominant industry standard, we address
the problem of how to transform a document conforming to a source DTD
so that it will conform to a target DTD. Our approach could easily be
adapted to XML Schema. Given a source and a target DTD, we first
model each DTD as a tree. This allows us to express the problem as how
to transform one DTD tree into another. To this end, we have
defined a set of DTD transformation operations that establish
the semantic relationships between two trees. We also define a cost
model for choosing a sequence of transformation operations among
multiple alternatives. We have developed an algorithm to discover a
sequence of operations (i.e., transformation script) that
transforms a source DTD tree into a target DTD tree. The discovery
process is based on provided auxiliary information (e.g., synonym
dictionary, domain ontology, etc.) and a cost model we define for
choosing a transformation script among multiple alternatives. Lastly,
we use the resulting transformation script to generate a eXtensible
Stylesheet Language Transformations (XSLT) script. The
XSLT script can then be applied to source XML documents to transform
them into XML documents conforming to the target DTD. Figure 1 shows
the architecture of our system.
Project Members
Advisor
Co-op Research Staff
Harumi A. Kuno (software research engineer in HP Palo Alto Labs)
Graduate Students
Sponsors
HP lab partial support of Hong Su from Sep. 2000 to Mar. 2001 in terms of a GrassRoot Breakthrough Grant.
Publications
Hong Su, Harumi Kuno, and Elke Rundensteiner,
Automating the transformation of XML documents, (.ps)
(.pdf)
Workshop on Web Information and Data Management (WIDM'01), Atlanta,
GA, USA, Nov. 9, 2001.
Hong's Talk at WIDM'01 (.pdf).
System Download
coming soon