WPI Computer
Science Department

Computer Science
Department - Database Systems Research Lab (DSRG)
------------------------------------------

(last updated Sep 11, 2001)


Xtra: Automating the Reconciliation of XML Documents



Xtra System Overview

Research Issues

Our Approach

Project Members

Sponsors

Publications

System Download

Xtra Overview

The advent of web services that use XML-based message exchange has spurred many efforts that address issues related to inter-enterprise service electronic commerce interactions. Currently emerging standards and technologies enable enterprise to describe and advertise their own Web Services and to discover and determine how to interact with services fronted by other businesses. However, these technologies do not address the problem of how to reconcile structural differences between similar types of documents supported by different enterprises. Transformations between such documents must thus be created manually on a case-by-case basis. In this paper, we explore the problem of how to automate the transformation of XML E-business documents. We develop an integrated solution that automates as much as possible all steps of the document transformation process. One, we propose a set of schema transformation operations that establish semantic relationships between two XML document schemas. Two, we define a model that allows us to compare the cost of performing these operations. Three, we introduce an algorithm that discovers an efficient sequence of operations for transforming a source document schema into a target document schema based on our cost model. The operation sequence then is used to generate an equivalent XSLT transformation script. Experimental results indicate that our algorithm can satisfactorily discover acceptable transformations.

Research Issues


Issue 1 -- Discovery of the semantic relationship between two XML schemas.


Issue 2 -- Translation of XML documents in the source format into target format

Our Approach

Since DTDs are currently the dominant industry standard, we address the problem of how to transform a document conforming to a source DTD so that it will conform to a target DTD. Our approach could easily be adapted to XML Schema. Given a source and a target DTD, we first model each DTD as a tree. This allows us to express the problem as how to transform one DTD tree into another. To this end, we have defined a set of DTD transformation operations that establish the semantic relationships between two trees. We also define a cost model for choosing a sequence of transformation operations among multiple alternatives. We have developed an algorithm to discover a sequence of operations (i.e., transformation script) that transforms a source DTD tree into a target DTD tree. The discovery process is based on provided auxiliary information (e.g., synonym dictionary, domain ontology, etc.) and a cost model we define for choosing a transformation script among multiple alternatives. Lastly, we use the resulting transformation script to generate a eXtensible Stylesheet Language Transformations (XSLT) script. The XSLT script can then be applied to source XML documents to transform them into XML documents conforming to the target DTD. Figure 1 shows the architecture of our system.
EVE-Architecture

Project Members

Sponsors

HP lab partial support of Hong Su from Sep. 2000 to Mar. 2001 in terms of a GrassRoot Breakthrough Grant.

Publications

Hong Su, Harumi Kuno, and Elke Rundensteiner,
Automating the transformation of XML documents, (.ps)
(.pdf) Workshop on Web Information and Data Management (WIDM'01), Atlanta, GA, USA, Nov. 9, 2001.

Hong's Talk at WIDM'01 (.pdf).

System Download

coming soon