next up previous
Next: Tools Used Up: No Title Previous: Converting HTML to Relational

Reusability of the Approach

An important question in the construction of web wrappers is the reusability of a wrapper for other web sites and schemas. This is an important issue that we tried to address in this project.

The main steps involved in the construction of the wrapper were:

  1. model web site in the relational model (define schema)
  2. examine capabilities of underlying web search forms and define SQL parser that extracts queries adapted to search forms from arbitrary SQL
  3. examine returned HTML-results, define HTML parser(s) for results and map to records in relational model
  4. generate parsers with JavaCC and JJTree
  5. write wrapper code and call parsers
  6. store results and run queries against database
  7. define and code user interface (API)

For a new web site, steps 1 through 4 have to be re-executed. Step 4 is automatic, which leaves steps 1,2, and 3 to be adapted to each new site. The main issues involved in these steps are:

These three steps have to be executed for each new wrapper that is written. More work on defining grammars for these purposes could be beneficial. The remaining software in the wrapper (Java-code) can largely be reused. The only knowledge about the web site that is hardcoded in the code is currently the translation of pattern matching expression.


next up previous
Next: Tools Used Up: No Title Previous: Converting HTML to Relational

Andreas Koeller
Mon May 10 13:40:38 EDT 1999