This document describes the XML file format that is used to save and load graph models (and eventually views) in the Diva graph package. XML stands for the Extensible Markup Language, and is described in more detail at the W3 consortium home page. XML was chosen as a basis for describing graphs because it is simple, regular, extensible, and has plenty of third-party tool support. We strongly prefer XML to serialized Java objects, because the XML format is human-readable/editable, and is independent of the object version so that a file saved in XML will always be compatible with the current version of the graph package.
There are two aspects to the XML graph format. The first is the document-type-definition file, graph.dtd, which defines the syntax for graph files and is not intended to be edited by the user. The second is the actual graph file, which is created by the user or written out by a graph editor. It describes a particular instance of a graph topology. The rest of this document is a translation of graph.dtd into English accompanied by examples.
Element |
Description |
Example |
Graph | Graph is the top-most element in the graph file. | <graph> ... </graph> |
Node | A node element represents a Node instance in the graph data structure. It is identified using a unique identifier which may be referenced by an edge, later in the file. | <node id="foo"/> |
Edge | An edge element represents an Edge instance in the graph data structure. It is identified by a unique identifier and paramaterized by an optional boolean value which specifies its directedness. It is also parameterized by a head and a tail identifiers, which must reference nodes defined earlier in the file. | <edge id="baz" tail="foo" head="bar" directed="true"/> |
CompositeNode | Consistent with the graph data structure, a composite node is both a graph and a node. It has the same parameters as a node, but has contents like a graph. | <compositeNode id="bar"> ... </compositeNode> |
Note: Graph elements are identified by strings. For parsing efficiency an element in the file must be defined prior to being references, so that we can perform a one-pass parse.
This example specifies a graph which contains a node a and a composite node x. X contains
two nodes b and c. There are directed edge e1 from c to b, e2 from x
to a, and e3 from b to a.
<graph> <node id="a"/> <compositeNode id="x"> <node id="b"/> <node id="c"/> </compositeNode> <edge id="e1" tail="c" head="b" directed="true"/> <edge id="e2" tail="x" head="a" directed="true"/> <edge id="e3" tail="b" head="a" directed="true"/> </graph>
The Java API to the Graph XML format consists of two classes. GraphParser parses a graph model from an XML file (and a DTD, which it uses to verify that the XML is syntactically correct). GraphWriter writes a graph model to a file. These are easy to use, and the snippet of Java code below shows how to use them to save and load files respectively.
There are two remaining issues in the design of the markup language, storage of view information and semantic objects/properties for nodes and edges.
We would like to be able to store view information, such as graph layout or node shape. This may be stored in a separate file, or as an annotation on top of the existing file format. We haven't settled on a design yet, so it's not supported in the current version. However, this is an important feature and will be supported in the next version.
The second issue is the support "semantic objects" or "properties" which are defined by the user and hang off the node. This may be achieved by an optional ID reference, which refers to some other XML object. This is less urgent to us than the layout issue, and we imagine that there's a "standard" solution out there somewhere--we just haven't looked too hard yet.
If you have an opinion on either of these two issues and would like to share it, we'd love to hear from you.