XmdvTool Release 4.0 User Guide

Table of Contents

  1. Introduction
  2. Interface Components
  3. Display Techniques Overview
  4. Brushing Overview
  5. Input File Format

Introduction

Multivariate Visualization with XmdvTool

XmdvTool allows users to examine data sets with a large number of dimensions or parameters.  We refer to this as multivariate data.  Four distinct methods of projecting this N-dimensional (N-D) data onto a 2-D screen are provided: scatterplots, glyphs, parallel coordinates, and dimensional stacking.  Users can interactively switch between display techniques. A major tool in XmdvTool for providing insights into N-D spatial relations is the N-D brush, which allows users to perform operations on the data points which fall within a user-specified N-D subspace of the total space defined by the data.

Software Development Group and Contact Information

Software Distribution and Copyrights

Copyright 1994-1999, All Rights Reserved.

Permission to use, copy, modify and distribute this software and its documentation for educational, research and non-profit purposes, without fee, and without a written agreement is hereby granted, provided that this copyright notice appear in all copies.

Permission to incorporate this software into commercial products may be obtained directly from the authors. This software is provided "as is" without express or implied warranty.  In no event will the authors be held liable for any damages arising from the use of this software.


Interface Components

Main Window


Data Summary and Statistics


Glyph Key


Dimension Stacking Key


Brush Toolbox

Brush Operation Toolbox

Brush operations are the actions that are to be performed on data selected by the brush.  XmdvTool supports four types of brush
operations:  highlight, mask, values, and average.  Each of these are explained in detail below.  In order to specify a brush operation the Operation Toolbox  must be used.

Highlight  -- This operation causes all the points covered by the brush to be displayed in a different color.  When this operation is active, the color will be available in the Color Requester .

Mask  -- This operation causes all points covered by the brush to be hidden from the display.  This is useful when the display is cluttered and it is necessary to remove some data to view other interesting data.

Values  -- This operation causes the numeric value of all points covered by the brush to be displayed in a separate popup window.  This window can be opened from the main interface by selecting Data Values.

Average  -- This operations causes the average value of all points currently selected by the brush to be displayed in each of the display views. In addition, if the Values  operation is also selected, the numeric average is added to the end of the values in the data values window.

Brush Expression

The brush expression is a logical expression between brushes.  Points are covered by the operation if they evaluate to TRUE for this expression.
 
 



Glyph Brush Popup

This tool represents an enlarged glyph.  The currently active brush coverage is displayed as a shaded region on the glyph.  The brush may be resized by dragging with the left mouse button and translated by dragging with the middle mouse button.



Dimension Stacking Brush Popup

This tool is composed of a series of sliders, each slider representing one dimension.  The horizontal sliders on the top of the display represent the horizontal dimensions and the vertical sliders represent the vertical dimensions in the dimensional stacking display.  The dimensions with the highest level in the hierarchy are located at the top and left of the tool. Likewise dimensions with the lowest level in the hierarchy are at the
bottom and right of the tool.  The ``thumb'' portion of the slider represents the brush coverage of the currently active brush in that dimension.  The brush may be resized by dragging with the left mouse button and may be recentered by dragging with the middle mouse button.
 
 















Parallel Coordinates Brush Popup


Auxillary Window


Data Values Popup

Cluster Hierarchy Toolbox



Display Techniques

Parallel Coordinates

In Parallel Coordinates, each dimension corresponds to an axis, and the N axes are organized as uniformly spaced vertical lines.  A data element in N-dimensional space manifests itself as a connected set of points, one on each axis.  Points lying on a common line or plane create readily perceived structures in the image.  In generating the display of parallel coordinates in XmdvTool, the view area is divided into N vertical slices of equal width.  At the center of each slice an axis is drawn, along with a label at the top end. Data points are generated as polylines across the N axes.

Scatterplots

Scatterplots are one of the oldest and most commonly used methods to project high-dimensional data to 2-dimensions.  In this method, an  grid of parallel projections of the data are generated. Each of these is a simple   plot for the two dimensions it represents.  The horizontal dimension of a scatterplot is controlled by the column it resides in, and likewise the vertical dimension is controlled by the row.  At the top of each column and at the left of each row is a label that shows what dimensions that row / column represents.


Glyphs

The definition of a glyph covers a large number of techniques which map data values to various geometric and color attributes of graphical primitives or symbols.  Histograms are perhaps the most widely recognized form of glyph, where data values control the height of rectangular bars.  One might even consider a scatterplot a specialized form of glyph representation, where data values control the positional attributes of a symbol.  In XmdvTool, we use the star glyph pattern, which creates for each data point N rays emanating at equal angles from a point on the screen.  Currently glyphs are spaced uniformly across the screen, though we might want to use 2 of the dimensions to control position.  The length of each ray is determined by the data value for that dimension.  A polyline is then generated to encompass the rays, forming a blob shape.  The glyph brush tool may be used as a key to the dimensions.

Dimension Stacking

Several recent techniques for multivariate display have emerged which involve projecting high-dimensional data by embedding dimensions within other dimensions.  One starts by discretizing the ranges of each dimension (assigning what we term a cardinality or number of buckets for a dimension), Each dimension is then assigned an orientation (in our case, this would be horizontal or vertical) and an ordering (dimensions are said to have unique ``speeds''). The dimensions with the 2 slowest speeds are used to divide a virtual screen into sections, with the cardinalities used to determine how many sections horizontally and vertically will be generated.  Each section is then used to define the virtual screen for the next 2 dimensions (slowest of the remaining dimensions), again using the cardinality to determine how to break up the virtual screen.  This is repeated until all dimensions have been embedded and the data point can be mapped to its screen location.  In a way, the process is similar to the manner in which the digits of an odometer move at different speeds.  XmdvTool requires three types of information to project data using dimensional stacking.  The first is the cardinality (number of buckets) for each dimension.  The range of values for each dimension is then decomposed into that many equal sized subranges.  The second type of information needed is the ordering for the dimensions, from outer-most (slowest) to inner-most (fastest).  Dimensions are assumed to alternate in orientation, and the order of the dimensions in the input file is assumed to be the order for the mapping process.  The last piece of information used is the minimum size for the plotted data item (the system will increase this value if the entire image can fit within the view area).  Each data point then maps into a unique bucket, which in turn maps to a unique location in the resulting image.  A key is provided in a separate window to help users understand the order of embedding, and grid lines of varying intensity provide assistance in interpreting transitions between buckets at different levels in the hierarchy.


Hierarchical Parallel Coordinates

The main difficulty of directly applying parallel coordinates to large data sets is that the level of clutter present in the visualization reduces the amount of useful information one can perceive. We introduce a variant of parallel coordinates called hierarchical parallel coordinates to cope with the problem of display clutter when dealing with large data sets. We develop a multiresolutional view of the data via hierarchical clustering. At each node of the resultant cluster tree, we maintain summary information of all points and sub-clusters rooted from it. This information is represent by making use of variable-width opacity bands. A graduated band faded from the dense middle to transparent edges that visually encodes the information of a cluster. The mean stretches across the middle of the band and is encoded with the deepest opacity. The top and bottom edges of the band have full transparency. The opacity across the rest of the band is linearly interpolated. The thickness of the band across each axis section represents the extents of the cluster in that dimension.













Brushing Overview

Data Brushing

A big problem with projecting data of a given number of dimensions to a smaller number of dimensions is that you invariably lose information regarding the spatial relationships of the data.  N-D brushing is a way of recovering some of this lost information by highlighting data points which fall into a user-defined subspace.  Thus, one could say ``show me the points within a (Euclidean) distance of N from a certain location.''  A brush is completely defined by its shape, size, and location.  We assume a hyper-parallelepiped (a fancy name for an N-D box), and the user specifies the size of the box in each dimension.  Then a location in N-D space is specified as a center point for the brush.  Any data point falling within the N-D brush can have an operator applied to it, such as highlighting or masking.

Each of the display methods allow the user to directly manipulate the brush.  A summary of manipulation techniques for each display method follows:

Scatterplots  -- The brush coverage on the scatterplot display is represented by a series of rectangular boxes, one for each scatterplot.  The boxes represent the range of points that the brush covers for the two dimensions of a particular scatterplot.  Each of these boxes may be manipulated using the mouse.  The left mouse button can be used to resize the brush -- dragging either a corner or edge of one of the rectangular boxes will alter the brush shape.  The middle mouse button may be used to recenter the brush -- dragging the box with the middle mouse button pressed will move the brush to the cursor position.

Glyphs  -- The brush coverage is not displayed in the glyph view. The brush may be recentered by clicking on a glyph with the left mouse button.  The brush will change location to be centered around the selected point.  This is useful for finding glyphs that are similar in shape to an existing glyph.

Parallel Coordinates  -- The brush coverage on the parallel coordinate display is represented by a shaded region that spans all the dimensions.  The shaded region along an axis represents the range of points that the brush covers along that axis.  Brush manipulation in this display works similarly to the scatterplot display.  Dragging the brush with the left mouse button will resize the brush while dragging with the middle mouse button will recenter the brush.

Dimensional Stacking  -- Brush coverage in this display is shown by shading all the bins that are contained by the brush.  Brush manipulation in this display works similarly to the glyph display. Clicking on a point will recenter the brush to that point.

Types of Data Brush

In XmdvTool brushes can have either a step edge or ramp edgeStep edge brushes allow points to either be completely inside or outside them.  When a highlight operation operation is performed on step edge brushes, points that are completely contained by the brush are highlighted, and points not contained are painted in the normal data color.

Ramped brushes do not have a discrete boundary.  Instead, along each dimension there is an inner and outer brush boundary.  The amount of coverage along one dimension is 1.0 inside the inner boundary and falls linearly to 0.0 at the outer boundary.  When a highlight operation is performed on ramped edge brushes, points that are completely contained by the brush are drawn in the normal highlight color for that brush.  Points that are partially contained by the brush are drawn in a color that is lighter as the coverage decreases. The amount of ramp of the brush is drawn as a thin line in the same color as the brush coverage on the parallel coordinate and scatterplot displays.  On each of these displays, the ramp boundary may be manipulated by holding down the Control key on the keyboard and dragging with the left mouse button.

Painting

Painting  is a method of creating a brush from the presented data. In order to use Painting the current brush must be both enabled and have its display attribute turned on (see Brush Toolbox). In the parallel coordinate and scatterplot displays painting is accomplished by holding down  the Shift key on the keyboard and dragging the mouse over points of interest.  As long as the Shift key is held down the mouse cursor will  continue to behave as a virtual paintbrush.  When the Shift key is released, a brush will be generated that contains all the points that were painted.

Structure-Based Brushing

This is a new variant of brushing that we have developed as a general mechanism for navigating in hierarchical space. We augment each node in the hierarchy, that is each cluster, with a monotonic value relative to its parent. This value can be, for example the level number, the cluster size/population, or the volume of the cluster's extents. This assigned value determines the control for the level-of-detail. By choosing a continuous control variable such as the cluster size, the traversal of the tree through different levels of detail can be smooth transitions instead of abrupt screen changes.

(a) Hierarchical tree frame (b) Contour corresponding to current level-of-detail (c) Leaf contour approximates shape of hierarchical tree (d) Structure-based brush (e) Interactive brush handles (f) Colormap legend for level-of-detail contour

The triangular frame depicts the hierarchical tree. The leaf contour depicts the silhouette of the hierarchical tree. It delineates the approximate shape formed by chaining the leaf nodes. The colored bold contour across the middle of the tree delineates the tree cut S(w) that represents the cluster partition corresponding to a level-of-detail, for instance the cluster radius. The  colors on the contour corresponds to the colors used for drawing the nodes on the main parallel coordinates display. The two movable handles on the base of the triangle, together with the apex of the triangle, form a wedge in the hierarchical space.

The brushing interaction for the user consists of localizing a subspace within the hierarchical space by positioning the two  handles at the base of the triangle. The embedded wedge forms a brushed subspace within the hierarchical space. Elements  within the brushed subspace may be examined at different level-of-detail.
 

Input Files Format

Data File

The data file format is a simple ASCII format and consists of the following four sections:

Section 1 -- The first line of the data file specifies the number of dimensions in the data set and the number of data elements. These values are stored as integers.

Section 2 -- This section contains the labels for each dimension. These are stored as ASCII strings and there must be tne name per line. The number of lines in this section must match the number of dimensions.

Section 3 -- This section contains the minimum and maximum values for each dimension followed by the number of bins to use in dimensional stacking. These are written as two floating point numbers and an integer in ASCII format. The values for each dimension are stored as triplets as such:  min1 max1 bin1 min2 max2 bin2 ...

Section 4 -- The final section contains the values of the raw data points. The number of floating point values on each  line must match the number of dimensions specified.

The following is a sample data file taken from "iris.okc"

4 150
sepal_length
sepal_width
petal_length
petal_width
4.3 7.9 5
2.0 4.4 5
1.0 6.9 5
0.1 2.5 5
5.1 3.5 1.4 0.2
4.9 3 1.4 0.2
.
.
.
6.2 3.4 5.4 2.3
5.9 3   5.1 1.8
 

Cluster File

Each data file has a corresponding cluster file as input to XmdvTool. It is important to note that our tool assumes that the data file is normalized before clustering. The cluster file may be generated using any clustering algorithm so long as it produces a cluster tree and composed of the following two sections:

Section 1 -- The first line in the cluster file specifies the number of clusters formed and the number of dimensions of the data. These values are stored as integers.

Section 2 -- This section contains a running index and information of each cluster. Each line contains a running index starting from 0 for the root node, the index of its parent (-1 for the root node), the number of elements in the cluster, followed by the sum of all elements in each dimension and the radius of the cluster (or any form of cluster measure). These information are stored as three integers and the rest as floating point numbers in ASCII format on a single line.

The following is a sample cluster file taken from "iris.cf":

293 4
0 -1 150 876.5 458.1 563.8 179.8  2.13045
1 0 100 626.2 287.2 490.6 167.6  1.18235
2 1 12 89.7 37.5 75.6 24.6 0.62283
.
.
.
290 289 1 5.4 3.4 1.7 0.2  0
291 289 1 5.4 3.4 1.5 0.4  0
292 288 1 5.5 3.5 1.3 0.2  0


XmdvTool Development Team:

Ending Credits



Last updated on 9/29/99
yingfua@cs.wpi.edu