XmdvTool Release 4.0 User Guide
Table of Contents
-
Introduction
-
Multivariate Visualization with XmdvTool
-
Software Development Group
-
Contact Information
-
Software Distribution and Copyrights
-
Interface Components
-
Main Window
-
System Menu
-
File Menu
-
Preferences Menu
-
Windows Menu
-
Help Menu
-
Main Display Canvas
-
Display Techniques Selection
-
Vertical/Horizontal Scrollbars
-
Canvas Magnification
-
Region Zoom
-
Message Bar
-
Data Summary and Statistics
-
Glyph Key
-
Dimension Stacking Key
-
Brush Toolbox
-
Glyph Brush Popup
-
Dimension Stacking Brush Popup
-
Parallel Coordinates Brush Popup
-
Auxillary Window
-
Data Values Popup
-
Cluster Hierarchy Toolbox
-
Structure-Based Brush
-
Cluster Size Distribution
-
Treemaps
-
Dynamic Masking
-
Extent Scaling
-
Display Techniques Overview
-
Parallel Coordinates
-
Scatterplots
-
Glyphs
-
Dimension Stacking
-
Hierarchical Parallel Coordinates
-
Brushing Overview
-
Data Brushing
-
Types of Data Brush
-
Painting
-
Structure-based Brushing
-
Input File Format
Introduction
Multivariate Visualization with XmdvTool
XmdvTool allows users to examine data sets with a large number of dimensions
or parameters. We refer to this as multivariate data. Four
distinct methods of projecting this N-dimensional (N-D) data onto a 2-D
screen are provided: scatterplots, glyphs, parallel coordinates, and dimensional
stacking. Users can interactively switch between display techniques.
A major tool in XmdvTool for providing insights into N-D spatial relations
is the N-D brush, which allows users to perform operations on the data
points which fall within a user-specified N-D subspace of the total space
defined by the data.
Software Distribution and Copyrights
Copyright 1994-1999, All Rights Reserved.
Permission to use, copy, modify and distribute this software and its
documentation for educational, research and non-profit purposes, without
fee, and without a written agreement is hereby granted, provided that this
copyright notice appear in all copies.
Permission to incorporate this software into commercial products may
be obtained directly from the authors. This software is provided "as is"
without express or implied warranty. In no event will the authors
be held liable for any damages arising from the use of this software.
Interface Components
Main Window
Data Summary and Statistics
Glyph Key
Dimension Stacking Key
Brush Toolbox

Brush Operation Toolbox
Brush operations are the actions that are to be performed on data selected
by the brush. XmdvTool supports four types of brush
operations: highlight, mask, values, and average. Each
of these are explained in detail below. In order to specify a brush
operation the Operation Toolbox must be used.
Highlight -- This operation causes all the points covered by the
brush to be displayed in a different color. When this operation is
active, the color will be available in the Color Requester .
Mask -- This operation causes all points covered by the brush
to be hidden from the display. This is useful when the display is
cluttered and it is necessary to remove some data to view other interesting
data.
Values -- This operation causes the numeric value of all points
covered by the brush to be displayed in a separate popup window.
This window can be opened from the main interface by selecting Data Values.
Average -- This operations causes the average value of all points
currently selected by the brush to be displayed in each of the display
views. In addition, if the Values operation is also selected, the
numeric average is added to the end of the values in the data values window.
Brush Expression
The brush expression is a logical expression between brushes.
Points are covered by the operation if they evaluate to TRUE for this expression.

Glyph Brush Popup
This tool represents an enlarged glyph. The currently active brush
coverage is displayed as a shaded region on the glyph. The brush
may be resized by dragging with the left mouse button and translated by
dragging with the middle mouse button.

Dimension Stacking Brush Popup
This tool is composed of a series of sliders, each slider representing
one dimension. The horizontal sliders on the top of the display represent
the horizontal dimensions and the vertical sliders represent the vertical
dimensions in the dimensional stacking display. The dimensions with
the highest level in the hierarchy are located at the top and left of the
tool. Likewise dimensions with the lowest level in the hierarchy are at
the
bottom and right of the tool. The ``thumb'' portion of the slider
represents the brush coverage of the currently active brush in that dimension.
The brush may be resized by dragging with the left mouse button and may
be recentered by dragging with the middle mouse button.

Parallel Coordinates Brush Popup
Auxillary Window
Data Values Popup
Cluster Hierarchy Toolbox
Display Techniques
Parallel Coordinates
In Parallel Coordinates, each dimension corresponds to an axis, and the
N axes are organized as uniformly spaced vertical lines. A data element
in N-dimensional space manifests itself as a connected set of points, one
on each axis. Points lying on a common line or plane create readily
perceived structures in the image. In generating the display of parallel
coordinates in XmdvTool, the view area is divided into N vertical slices
of equal width. At the center of each slice an axis is drawn, along
with a label at the top end. Data points are generated as polylines across
the N axes.
Scatterplots
Scatterplots are one of the oldest and most commonly used methods to project
high-dimensional data to 2-dimensions. In this method, an grid
of parallel projections of the data are generated. Each of these is a simple
plot for the two dimensions it represents. The horizontal dimension
of a scatterplot is controlled by the column it resides in, and likewise
the vertical dimension is controlled by the row. At the top of each
column and at the left of each row is a label that shows what dimensions
that row / column represents.

Glyphs
The definition of a glyph covers a large number of techniques which map
data values to various geometric and color attributes of graphical primitives
or symbols. Histograms are perhaps the most widely recognized form
of glyph, where data values control the height of rectangular bars.
One might even consider a scatterplot a specialized form of glyph representation,
where data values control the positional attributes of a symbol.
In XmdvTool, we use the star glyph pattern, which creates for each data
point N rays emanating at equal angles from a point on the screen.
Currently glyphs are spaced uniformly across the screen, though we might
want to use 2 of the dimensions to control position. The length of
each ray is determined by the data value for that dimension. A polyline
is then generated to encompass the rays, forming a blob shape. The
glyph brush tool may be used as a key to the dimensions.

Dimension Stacking
Several recent techniques for multivariate display have emerged which involve
projecting high-dimensional data by embedding dimensions within other dimensions.
One starts by discretizing the ranges of each dimension (assigning what
we term a cardinality or number of buckets for a dimension), Each dimension
is then assigned an orientation (in our case, this would be horizontal
or vertical) and an ordering (dimensions are said to have unique ``speeds'').
The dimensions with the 2 slowest speeds are used to divide a virtual screen
into sections, with the cardinalities used to determine how many sections
horizontally and vertically will be generated. Each section is then
used to define the virtual screen for the next 2 dimensions (slowest of
the remaining dimensions), again using the cardinality to determine how
to break up the virtual screen. This is repeated until all dimensions
have been embedded and the data point can be mapped to its screen location.
In a way, the process is similar to the manner in which the digits of an
odometer move at different speeds. XmdvTool requires three types
of information to project data using dimensional stacking. The first
is the cardinality (number of buckets) for each dimension. The range
of values for each dimension is then decomposed into that many equal sized
subranges. The second type of information needed is the ordering
for the dimensions, from outer-most (slowest) to inner-most (fastest).
Dimensions are assumed to alternate in orientation, and the order of the
dimensions in the input file is assumed to be the order for the mapping
process. The last piece of information used is the minimum size for
the plotted data item (the system will increase this value if the entire
image can fit within the view area). Each data point then maps into
a unique bucket, which in turn maps to a unique location in the resulting
image. A key is provided in a separate window to help users understand
the order of embedding, and grid lines of varying intensity provide assistance
in interpreting transitions between buckets at different levels in the
hierarchy.

Hierarchical Parallel Coordinates
The main difficulty of directly applying parallel coordinates to large
data sets is that the level of clutter present in the visualization reduces
the amount of useful information one can perceive. We introduce a variant
of parallel coordinates called hierarchical parallel coordinates to cope
with the problem of display clutter when dealing with large data sets.
We develop a multiresolutional view of the data via hierarchical clustering.
At each node of the resultant cluster tree, we maintain summary information
of all points and sub-clusters rooted from it. This information is represent
by making use of variable-width opacity bands. A graduated band faded from
the dense middle to transparent edges that visually encodes the information
of a cluster. The mean stretches across the middle of the band and is encoded
with the deepest opacity. The top and bottom edges of the band have full
transparency. The opacity across the rest of the band is linearly interpolated.
The thickness of the band across each axis section represents the extents
of the cluster in that dimension.

Brushing Overview
Data Brushing
A big problem with projecting data of a given number of dimensions to a
smaller number of dimensions is that you invariably lose information regarding
the spatial relationships of the data. N-D brushing is a way of recovering
some of this lost information by highlighting data points which fall into
a user-defined subspace. Thus, one could say ``show me the points
within a (Euclidean) distance of N from a certain location.'' A brush
is completely defined by its shape, size, and location. We assume
a hyper-parallelepiped (a fancy name for an N-D box), and the user specifies
the size of the box in each dimension. Then a location in N-D space
is specified as a center point for the brush. Any data point falling
within the N-D brush can have an operator applied to it, such as highlighting
or masking.
Each of the display methods allow the user to directly manipulate the
brush. A summary of manipulation techniques for each display method
follows:
Scatterplots -- The brush coverage on the scatterplot display
is represented by a series of rectangular boxes, one for each scatterplot.
The boxes represent the range of points that the brush covers for the two
dimensions of a particular scatterplot. Each of these boxes may be
manipulated using the mouse. The left mouse button can be used to
resize the brush -- dragging either a corner or edge of one of the rectangular
boxes will alter the brush shape. The middle mouse button may be
used to recenter the brush -- dragging the box with the middle mouse button
pressed will move the brush to the cursor position.
Glyphs -- The brush coverage is not displayed in the glyph view.
The brush may be recentered by clicking on a glyph with the left mouse
button. The brush will change location to be centered around the
selected point. This is useful for finding glyphs that are similar
in shape to an existing glyph.
Parallel Coordinates -- The brush coverage on the parallel coordinate
display is represented by a shaded region that spans all the dimensions.
The shaded region along an axis represents the range of points that the
brush covers along that axis. Brush manipulation in this display
works similarly to the scatterplot display. Dragging the brush with
the left mouse button will resize the brush while dragging with the middle
mouse button will recenter the brush.
Dimensional Stacking -- Brush coverage in this display is shown
by shading all the bins that are contained by the brush. Brush manipulation
in this display works similarly to the glyph display. Clicking on a point
will recenter the brush to that point.
Types of Data Brush
In XmdvTool brushes can have either a step edge or ramp
edge. Step edge brushes allow points to either
be completely inside or outside them. When a highlight operation
operation is performed on step edge brushes, points that are completely
contained by the brush are highlighted, and points not contained are painted
in the normal data color.
Ramped brushes do not have a discrete boundary.
Instead, along each dimension there is an inner and outer brush boundary.
The amount of coverage along one dimension is 1.0 inside the inner boundary
and falls linearly to 0.0 at the outer boundary. When a highlight
operation is performed on ramped edge brushes, points that are completely
contained by the brush are drawn in the normal highlight color for that
brush. Points that are partially contained by the brush are drawn
in a color that is lighter as the coverage decreases. The amount of ramp
of the brush is drawn as a thin line in the same color as the brush coverage
on the parallel coordinate and scatterplot displays. On each of these
displays, the ramp boundary may be manipulated by holding down the Control
key on the keyboard and dragging with the left mouse button.
Painting
Painting is a method of creating a brush from the presented data.
In order to use Painting the current brush must be both enabled
and have its display attribute turned on (see Brush Toolbox). In the parallel
coordinate and scatterplot displays painting is accomplished by holding
down the Shift key on the keyboard and dragging the mouse over points
of interest. As long as the Shift key is held down the mouse cursor
will continue to behave as a virtual paintbrush. When the Shift
key is released, a brush will be generated that contains all the points
that were painted.
Structure-Based Brushing
This is a new variant of brushing that we have developed as a general mechanism
for navigating in hierarchical space. We augment each node in the hierarchy,
that is each cluster, with a monotonic value relative to its parent. This
value can be, for example the level number, the cluster size/population,
or the volume of the cluster's extents. This assigned value determines
the control for the level-of-detail. By choosing a continuous control variable
such as the cluster size, the traversal of the tree through different levels
of detail can be smooth transitions instead of abrupt screen changes.

(a) Hierarchical tree frame (b) Contour corresponding to current level-of-detail
(c) Leaf contour approximates shape of hierarchical tree (d) Structure-based
brush (e) Interactive brush handles (f) Colormap legend for level-of-detail
contour
The triangular frame depicts the hierarchical tree. The leaf contour
depicts the silhouette of the hierarchical tree. It delineates the approximate
shape formed by chaining the leaf nodes. The colored bold contour across
the middle of the tree delineates the tree cut S(w) that represents the
cluster partition corresponding to a level-of-detail, for instance the
cluster radius. The colors on the contour corresponds to the colors
used for drawing the nodes on the main parallel coordinates display. The
two movable handles on the base of the triangle, together with the apex
of the triangle, form a wedge in the hierarchical space.
The brushing interaction for the user consists of localizing a subspace
within the hierarchical space by positioning the two handles at the
base of the triangle. The embedded wedge forms a brushed subspace within
the hierarchical space. Elements within the brushed subspace may
be examined at different level-of-detail.
Input Files Format
Data File
The data file format is a simple ASCII format and consists of the following
four sections:
Section 1 -- The first line of the data file specifies the number of
dimensions in the data set and the number of data elements. These values
are stored as integers.
Section 2 -- This section contains the labels for each dimension. These
are stored as ASCII strings and there must be tne name per line. The number
of lines in this section must match the number of dimensions.
Section 3 -- This section contains the minimum and maximum values for
each dimension followed by the number of bins to use in dimensional stacking.
These are written as two floating point numbers and an integer in ASCII
format. The values for each dimension are stored as triplets as such:
min1 max1 bin1 min2 max2 bin2 ...
Section 4 -- The final section contains the values of the raw data points.
The number of floating point values on each line must match the number
of dimensions specified.
The following is a sample data file taken from "iris.okc"
4 150
sepal_length
sepal_width
petal_length
petal_width
4.3 7.9 5
2.0 4.4 5
1.0 6.9 5
0.1 2.5 5
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
.
.
.
6.2 3.4 5.4 2.3
5.9 3 5.1 1.8
Cluster File
Each data file has a corresponding cluster file as input to XmdvTool. It
is important to note that our tool assumes that the data file is normalized
before clustering. The cluster file may be generated using any clustering
algorithm so long as it produces a cluster tree and composed of the following
two sections:
Section 1 -- The first line in the cluster file specifies the number
of clusters formed and the number of dimensions of the data. These values
are stored as integers.
Section 2 -- This section contains a running index and information of
each cluster. Each line contains a running index starting from 0 for the
root node, the index of its parent (-1 for the root node), the number of
elements in the cluster, followed by the sum of all elements in each dimension
and the radius of the cluster (or any form of cluster measure). These information
are stored as three integers and the rest as floating point numbers in
ASCII format on a single line.
The following is a sample cluster file taken from "iris.cf":
293 4
0 -1 150 876.5 458.1 563.8 179.8 2.13045
1 0 100 626.2 287.2 490.6 167.6 1.18235
2 1 12 89.7 37.5 75.6 24.6 0.62283
.
.
.
290 289 1 5.4 3.4 1.7 0.2 0
291 289 1 5.4 3.4 1.5 0.4 0
292 288 1 5.5 3.5 1.3 0.2 0
XmdvTool Development Team:
-
Matthew O. Ward (1994-1999) (matt@cs.wpi.edu)
-
Allen R. Martin (1995)
-
Ying-Huey Fua (1998-1999) (yingfua@cs.wpi.edu)
Ending Credits
Last updated on 9/29/99
yingfua@cs.wpi.edu