PhD Program Homepage

News and Events

	Estimate Graduate Date May 2003.
	PhD Dissertation Proposal (.ps)(.pdf) Approved on May 17, 2001.
	Comprehensive Exam Accomplished on March 29, 2000.
	Comprehensive Exam Topics Posted on March 7, 2000.
	Estimate Comprehensive Exam Time: March 6 to March 27, 2000.
	Created PhD Dissertation Committee on Jan. 24, 2000.
	Enrolled in PhD Program.
	Passed PhD Qualifying Examination Written Test on Apr. 15, 1998.
	Taken PhD Qualifying Examination Written Test on Apr. 8, 1998.

My PhD Committee:

	Prof. Carolina Ruiz, Assisant Professor, ruiz@cs.wpi.edu.
	Prof. Karen Lemon, Associate Professor, WPI, kal@cs.wpi.edu.
	Dr. Gail Mitchell, Senior Scientist, GTE Labs, External Member, gmitchell@gte.com.
	Prof. Elke A Rundensteiner, Associate Professor, WPI, Ph.D. chair, rundenst@cs.wpi.edu.

Dissertation Direction:

For my Ph.D. research, I plan to focus on web related database management issues. In my research thus far, I have already done some research on data warehousing over distributed information sources (ISs). I have proposed a parallel algorithm that improves the performance of the view maintenance algorithm called SWEEP proposed by Agrawal et al., as demonstrated using experimental studies. My master research proposed several techniques for maintaining data warehouses defined over distributed information sources under both data updates and schema changes of these sources, while previous work in the literature focused on data updates only.

More recently, I have also been involved in research jointly with GTE Labs on loading XML documents into relational databases. In particular, I have proposed a mapping algorithm to generate a relation schema out of a Document Type Definition (DTD), and also an algorithm for loading XML documents complying to that DTD into the generated relational schema. My approach keeps additional information in the metadata tables during the mapping process in order to automatically load the XML documents into the generated schema, while the current industry solutions are manually specifying the mapping to load the XML data.

Based on my previous work, I am now planning to study research topics such as:

	XML Data Bulk Loading: A natural next step once loading is done.
	XML-Relational Wrapper: Detection of XML updates and propagation to RDBMS.
	XML Data Integration and Data Warehousing: Integrate and maintain XML data stored in data warehouses.
	XML Query Processing and Optimization: Tradeoffs between loading XML data into relational schema and then query it versus querying the XML data directly, and other such issues.

Comprehensive Exam:

AREA 1: Web Data Management

XML is an emerging standard for flexible information sharing on the web, and several query language proposals exist for declaratively extracting information from XML data. XML query languages have been recently designed to query over such XML documents.

1.A.

SQL combines designating search space, search criteria and presentation of the query result into one query expression. Hence, these are likely to be the current expectations for a declarative language. How do the different query languages proposed for XML documents, notably, XSL, XQL and XML_QL, deal with these three aspects? What are the implications for procedural languages to implement the queries?

1.B.

Propose a general algorithm for translating XML_QL queries into SQL-92 , assuming a 'semantic' mapping from XML to the relational model (e.g., like the one proposed in the VLDB'99 paper or the one you, Gail Mitchell and W.C. Lee have proposed recently) and not a 'syntactic' mapping such as those studied by Kossman et al. First, briefly describe the data model mapping you assume as basis for your query translation strategy. Then, present your strategy or algorithm. Discuss if there are query classes in XML_QL that cannot be mapped to SQL. Indicate also the expected efficiency or inefficiency of the resulting SQL queries.

1.C.

Briefly indicate if mapping instead to SQL-3 (SQL with OO extensions) will be more or less promising, and give your intuition why. There is no need to actually propose a full algorithm for this question.

AREA 2: Data warehousing

Data warehousing is a critical technology currently being employed by businesses for effective decision support purposes. Data warehousing in industry to date is based on relational database technology. With the emergence of the internet, we may now want to consider how 1) semi-structured data (say XML) and 2) the increasing scale of widely distributed sites influence warehousing technology as well as increase their potential.

2.A.

Provide a classification of architectural design choices possible to tackle data warehousing in this web context, i.e., to build data warehouses over web sources. What are research issues and problems that must be addressed in each of the different architectural solutions for web warehousing? You should characterize the problems but you do not need to
provide solutions to them. You should also conduct a literature search and categorize on-going project efforts into your above classification.

2.B.

The dynamic data warehousing system (DYDA) has been developed for incrementally yet consistently maintaining a relational data warehouse under a mixture of concurrent schema and data changes of multiple autonomous sources. Sketch how you would modify the DYDA solution and its algorithms to be able to deal with XML as data sources and XML_QL as query language used for data warehouse generation. If none, justify.

AREA 3: Software Engineering, in particular, software repositories, UML modeling, metadata

3.A.

XML's limitations as systems' integration solution. XML is often cited as the solution for integrating systems. Why? What are issues that must be addressed when integrating systems? Which of these integration problems does XML actually solve, if any? Which of these integration problems does XML fail to solve?

3.B.

Meta-data repositories, such as Rochade, have been proposed as technology for the integration of the different phases of the software development lifecycle. Analyze and discuss the pros and cons of this solution approach, i.e., what does this solve and what remains unsolved? How does XML would fit into that picture, if at all?