Interactions
N-Dimensional
Brushing in Flat Approaches
Hierarchical
Clustering
Navigation
and Filtering Tools in Hierarchical approaches
Navigation
and Filtering Tools in Flat and Hierarchical approaches
N-Dimensional
Brushing in Flat Approaches
A useful
capability of XmdvTool is N-dimensional brushing[martin:95]. Brushing
is a process in which a user can highlight, select, or delete
a subset of elements being graphically displayed by pointing at
the elements with a mouse or other suitable input device. In situations
where multiple views of the data are being shown simultaneously
(e.g. scatterplots), brushing is often associated with a process
known as Linking, in which brushing elements in one view
affects the same data in all other views. Brushing has been employed
as a method for assisting data analysis for many years.
One of the first brushing techniques was applied to high dimensional
scatterplots [BEC:88]. In this system, the user specified
a rectangular region in one of the 2-D scatterplot projections,
and based on the mode of operation, points in other views corresponding
to those falling within the brush were highlighted, deleted, or
labeled. Brushing has also been used to help users select data
points for which they desire further information. Smith
et. al. [SMI:90] used brushing of images generated by stick figure
icons to obtain higher dimensional information through sonification
for the selected data points.
In XmdvTool,
the notion of brushing has been extended to permit brushes to
have dimensionality greater than two. The goal is to allow
the user to gain some understanding of spatial relationships in
N-space by highlighting all data points which fall within a user-defined,
relocatable subspace. N-D brushes have the following characteristics:
Brush
Shape: In XmdvTool, the shape of the brush is that of an N-D
hyperbox. Other generic shapes, such as hyperellipses, will be
added in the future, as well as customized shapes, which can consist
of any connected arbitrary N-D subspace.
Brush
Size:For generic shapes the user simply needs to specify N
brush dimensions. The mechanism used by XmdvTool to perform
this, albeit primitive, is to use N slider bars.
Brush
Boundary:In XmdvTool, the boundary of a brush is a step edge.
Another possibility would be a ramp, with many possibilities for
the shape of the ramp. Another interesting enhancement could
be achieved by coloring data points according to the degree of
brush coverage (where it falls along the ramp).
Brush
Positioning:Brushes have a position which the user must be
able to easily and intuitively control. In the general case, the
user needs to specify N values to uniquely position the brush.
This is done in XmdvTool via the same sliders employed in size
specification.
Brush
Motion: Although XmdvTool currently supports only manual brush
motion, we hope to implement several forms of brush path specification
in the future.
Brush
Display: N-dimensional space is usually quite sparse, thus
it is useful at times to display the subspace covered by the brush
on the data display. The location can be indicated either
by the brush's boundary or a shaded region showing the area of
coverage. In XmdvTool, brushes are displayed as shaded blue-grey
regions, with data points which fall within the brush highlighted
in red.
Brush size
and position are currently specified in a rather simplistic manner.
The user selects the dimension to be adjusted, and then changes
the brush size or position via a slider. There are many
opportunities for allowing the user to directly manipulate the
brush in the display area, although each procedure would need
to be customized based on the projection method in use.
For example, the user could move or resize one dimension of the
brush by dragging the edge or center of the brush along one of
the axes of the Parallel Coordinate display, or set the location
of the brush by selecting one of the glyphs. Direct manipulation
of the brush will be one of the features incorporated into future
releases of XmdvTool.
XmdvTool4.1
provides a new function of saving brushed data. This function
will save the brushed data as a new okc file.
Hierarchical
Clustering
For the purpose
of interactive visualization of large multivariate data
sets, we develop a multiresolutional view of the data via hierarchical
clustering, and use hierarchical approaches to convey aggregation
information for the resulting clusters. Users can then navigate
the resulting structure until the desired focus region and level
of detail is reached, using our suite of navigational and filtering
tools. Our goal is to support an active process of discovery as
opposed to passive display. We believe that it is only through
data exploration that meaningful ideas, relations, and subsequent
inferences may be extracted from the data.
1. Hierarchical
Clustering
Our primary
purpose for building a cluster hierarchy is to structure and present
data at different levels of abstraction. A clustering algorithm
groups objects or data items based on measures of proximity between
pairs of objects [jain:88]. In particular, a hierarchical
clustering algorithm constructs a tree of nested clusters based
on proximity information.
2. Multiresolutional
Cluster Display
We define
a horizontal cut S across a tree T as a boundary
that divides T into a top half and a bottom half and satisfies
the following criteria: for each path R from the root to
a leaf, S intersects R at exactly one point.
Clearly S
defines a partition of the data set E. We may then vary
the level-of-detail (LOD) in our data display by changing the
parameters that control the location of S. We choose w
as the LOD control parameter. Define S(w) as the collection
of clusters whose size is less than or equal to w but whose
parent's size is greater than w. It is a continuous LOD
control parameter that provides smooth transitions on our data
display.
3. Proximity-Based
Coloring
In hierarchical
approaches, we assign a color to each cluster. Our method maps
colors by cluster proximity, hence the name proximity-based
coloring. This proximity is based on the structure of the
hierarchical tree, that is sibling nodes are considered closer
than non-sibling nodes. We first impose a linear order on the
data clusters gathered for display at a given LOD value, w.
Then, we assign colors to each cluster by looking up a linear
colormap table. Our approach colors clusters based on the cluster
order derived during the tree traversal. The color ranges assigned
are nested just like clusters are nested, meaning larger clusters
are assigned a broader range of color values and smaller clusters
are assigned narrower ranges. Since small clusters imply that
elements are closer to each other, they are assigned closer color
values on the narrower color range. In addition, a "buffer"
is introduced between subtrees. The buffer acts as an unused color
interval between subtrees so that elements at the proximal ends
of subtrees are not assigned colors that are indistinguishable.
Clearly the buffer should be larger between large subtrees and
smaller otherwise.
Proximity-based
coloring highlights the relationships among clusters. It is however
not always possible to impose a linear order on the data clusters.
For instance, a cluster chain forming a circular loop is not amenable
to any linear order. In this case, an arbitrary break must be
made at some point in the loop. Data elements at the break point,
though similar according to our proximity measure, may be assigned
contrasting colors.
Navigation
and Filtering Tools in Hierarchical approaches
In this section,
we describe the set of manipulation and filtering tools that allow
us to interactively modify the display in order to discover new
or hidden relationships in the data set.
1. Structure-Based
Brushing
Brushing,
in the context of multivariate visualization, refers to an interactive
process for localizing a subset of a data set
[martin:95,wong:96,wegman:97]. Many useful operations, such as
highlighting, deleting, masking, or aggregation, may then be performed
on elements that lie within the brushed subspace.
Brushing
is a direct and data-driven metaphor. The operation may be performed
in 2-D screen space, e.g., via methods such as rubber-banding
rectangles or mouse lasso operations. Brushing may also be performed
in N-D data space by interactive creation of N-D hyperboxes by
painting over data points of interest.
We
develop structure-based brushing as a general mechanism
for navigating in hierarchical space (see figure 1). Details of
the structure-based brush can be found in [fua:99b].
The triangular
frame depicts the hierarchical tree. The leaf contour depicts
the silhouette of the hierarchical tree. It delineates the approximate
shape formed by chaining the leaf nodes. The colored bold contour
across the middle of the tree delineates the tree cut S(w)
that represents the cluster partition corresponding to a level-of-detail
w. The colors on the contour correspond to the colors used
for drawing the nodes on the main parallel coordinates display.
The two movable handles on the base of the triangle, together
with the apex of the triangle, form a wedge in the hierarchical
space.
The brushing
interaction for the user consists of localizing a subspace within
the hierarchical space by positioning the two handles at the base
of the triangle. The embedded wedge forms a brushed subspace within
the hierarchical space. Elements within the brushed subspace may
be examined at different level-of-detail, or magnified and examined
in full view, or masked or emphasized using fading in/out operations.
The user may then specify the level or levels of interest by selecting
a vertical value or range on the structure display.
The main
advantage of structure-based brushing is derived from the color
correspondence between the data display and the structure display.
Sets of elements may be selected by positioning the wedge handles
so as to bound the range of colors spanned by the elements. Moreover,
similar elements are selected as a group, since by our coloring
criteria, similar elements are drawn in similar colors.
Structure-based
brushing is a general method of brushing in hierarchical space,
hence it can be applied to various hierarchical multiresolutional
visualization techniques. We compare and contrast it with the
conventional way of brushing [fua:99b].
2. Drill-down
and Roll-up
The two basic
hierarchical operations when displaying data at multiple levels
of aggregation are the ``drill-down'' and ``roll-up'' operations.
Drill-down refers to the process of viewing data at a level of
increased detail, while roll-up refers to the process of viewing
data with decreasing detail.
Our system
provides smooth and continuous level-of-detail control in all
drilling operations. The control parameter is based on a measure
of cluster size. The level-of-detail can be varied indirectly
using a slider or directly by adjusting the colored contour across
the cluster tree.
We couple
our drilling operations with brushing. Our system permits selective
drill-down and roll-up of the brushed and non-brushed region independently.
This flexibility is important as it allows the viewing of a subset
of elements in varying levels of detail in relation to elements
outside the subset.
3. Extent
Scaling
Where there
are overlapping bands, it is often difficult to isolate or tell
them apart. Our system overcomes this difficulty by allowing the
thickness of bands to be scaled uniformly via a dynamically controlled
scale factor. With this feature we can, for example, reveal the
relative sizes of the extents while reducing occlusions.
4. Dynamic
Masking
Another tool
for managing the complexity of a dense display is a process we
call dynamic masking. This involves controlling the relative
opacity between brushed and unbrushed areas. With dynamic masking,
the viewer can interactively fade out the unbrushed nodes, thereby
obtaining a clearer view of the brushed nodes. Conversely, the
brushed nodes can be faded out, thus obtaining a clearer view
of the unbrushed region. Hence, context is maintained while reducing
clutter.
Navigation
and Filtering Tools in Flat and Hierarchical approaches
1. Dimension
Zooming
The use of
distortion techniques [leung:94,rao:95] has become increasingly
common as a means for visually exploring dense information displays.
Distortion operations allow the selective enlargement of subsets
of the data display while maintaining context with surrounding
data. We introduce a distortion operation that we term dimension
zooming. We scale up each of the dimensions independently
with respect to the extents of the brushed subspace, thus filling
the display area. The subset of elements may then be examined
as an independent data set. This zooming operation may be performed
as many times as desired. For a data set occupying a large range
of values, this operation is invaluable for examining localized
trends.
2. Dimension
reordering
The newest
version of XmdvTool(4.1) gives users the freedom of reordering
the dimensions.
References
[BEC:88]:
Becker, R.A., Cleveland, W.S.. Brushing Scatterplots. Dynamic
Graphics for Statistics, 1988.
[fua:99a]
Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner. Hierarchical
Parallel Coordinates for Exploration of Large Datasets. Visualization
¡®99, p. 43-50, October, 1999.
[fua:99b]:
Y. Fua, M. Ward, and E. Rundensteiner. Navigating hierarchies
with structure-based brushes. Proc. of Information Visualization
'99, Oct. 1999.
[jain:88]:
K. Jain and C. Dubes. Algorithms for Clustering Data. Prentice
Hall, 1988.
[leung:94]:
Y. Leung and M. Apperley. A review and taxonomy of distortion-oriented
presentation techniques.
ACM Transactions on Computer-Human Interaction Vol. 1(2),
June 1994, p. 126-160, 1994.
[martin:95]:
A. Martin and M. Ward. High dimensional brushing for interactive
exploration of multivariate
data. Proc. of Visualization '95, p. 271-8,
1995.
[rao:95]:
R. Rao and S. Card. Exploring large tables with the table
lens. Proc. of ACM CHI'95 Conference on Human Factors
in Computing Systems, Vol. 2, p. 403-4, 1995.
[SMI:90]:
Smith, S., Bergeron, R.D., Grinstein, G.. Stereophonic and
surface sound generation for exploratory data analysis.
Proc. CHI '90: Human Factors in Computer Systems, pp. 125-132,
1990.
[WAR:94]:
M. Ward. Xmdvtool: Integrating multiple methods for visualizing
multivariate data. Proc. of Visualization '94, p.
326-33, 1994.
[wegman:97]:
E. Wegman and Q. Luo. High dimensional clustering using
parallel coordinates and the grand
tour. Computing Science and Statistics, Vol.
28, p. 361-8., 1997.
[wong:96]:
P. Wong and R. Bergeron. Multiresolution multidimensional
wavelet brushing. Proc. of Visualization '96, p.
141-8, 1996.
|