N-Dimensional Brushing in Flat Approaches

Hierarchical Clustering

Navigation and Filtering Tools in Hierarchical approaches

Navigation and Filtering Tools in Flat and Hierarchical approaches

N-Dimensional Brushing in Flat Approaches

A useful capability of XmdvTool is N-dimensional brushing[martin:95]. Brushing is a process in which a user can highlight, select, or delete a subset of elements being graphically displayed by pointing at the elements with a mouse or other suitable input device. In situations where multiple views of the data are being shown simultaneously (e.g. scatterplots), brushing is often associated with a process known as Linking, in which brushing elements in one view affects the same data in all other views. Brushing has been employed as a method for assisting data analysis for many years.  One of the first brushing techniques was applied to high dimensional scatterplots [BEC:88].  In this system, the user specified a rectangular region in one of the 2-D scatterplot projections, and based on the mode of operation, points in other views corresponding to those falling within the brush were highlighted, deleted, or labeled. Brushing has also been used to help users select data points for which they desire further information.  Smith et. al. [SMI:90] used brushing of images generated by stick figure icons to obtain higher dimensional information through sonification for the selected data points.

In XmdvTool, the notion of brushing has been extended to permit brushes to have dimensionality greater than two.  The goal is to allow the user to gain some understanding of spatial relationships in N-space by highlighting all data points which fall within a user-defined, relocatable subspace.  N-D brushes have the following characteristics:

Brush Shape: In XmdvTool, the shape of the brush is that of an N-D hyperbox. Other generic shapes, such as hyperellipses, will be added in the future, as well as customized shapes, which can consist of any connected arbitrary N-D subspace.

Brush Size:For generic shapes the user simply needs to specify N brush dimensions.  The mechanism used by XmdvTool to perform this, albeit primitive, is to use N slider bars.

Brush Boundary:In XmdvTool, the boundary of a brush is a step edge. Another possibility would be a ramp, with many possibilities for the shape of the ramp.  Another interesting enhancement could be achieved by coloring data points according to the degree of brush coverage (where it falls along the ramp).

Brush Positioning:Brushes have a position which the user must be able to easily and intuitively control. In the general case, the user needs to specify N values to uniquely position the brush.  This is done in XmdvTool via the same sliders employed in size specification.

Brush Motion: Although XmdvTool currently supports only manual brush motion, we hope to implement several forms of brush path specification in the future.

Brush Display: N-dimensional space is usually quite sparse, thus it is useful at times to display the subspace covered by the brush on the data display.  The location can be indicated either by the brush's boundary or a shaded region showing the area of coverage.  In XmdvTool, brushes are displayed as shaded blue-grey regions, with data points which fall within the brush highlighted in red.

Brush size and position are currently specified in a rather simplistic manner.  The user selects the dimension to be adjusted, and then changes the brush size or position via a slider.  There are many opportunities for allowing the user to directly manipulate the brush in the display area, although each procedure would need to be customized based on the projection method in use.  For example, the user could move or resize one dimension of the brush by dragging the edge or center of the brush along one of the axes of the Parallel Coordinate display, or set the location of the brush by selecting one of the glyphs.  Direct manipulation of the brush will be one of the features incorporated into future releases of XmdvTool.

XmdvTool4.1 provides a new function of saving brushed data. This function will save the brushed data as a new okc file.

Hierarchical Clustering

For the purpose of  interactive visualization of large multivariate data sets, we develop a multiresolutional view of the data via hierarchical clustering, and use hierarchical approaches to convey aggregation information for the resulting clusters.  Users can then navigate the resulting structure until the desired focus region and level of detail is reached, using our suite of navigational and filtering tools. Our goal is to support an active process of discovery as opposed to passive display. We believe that it is only through data exploration that meaningful ideas, relations, and subsequent inferences may be extracted from the data.

1. Hierarchical Clustering

Our primary purpose for building a cluster hierarchy is to structure and present data at different levels of abstraction. A clustering algorithm groups objects or data items based on measures of proximity between pairs of objects [jain:88].  In particular, a hierarchical clustering algorithm constructs a tree of nested clusters based on proximity information.

2. Multiresolutional Cluster Display

We define a horizontal cut S across a tree T as a boundary that divides T into a top half and a bottom half and satisfies the following criteria: for each path R from the root to a leaf, S intersects R at exactly one point.

Clearly S defines a partition of the data set E. We may then vary the level-of-detail (LOD) in our data display by changing the parameters that control the location of S. We choose w as the LOD control parameter. Define S(w) as the collection of clusters whose size is less than or equal to w but whose parent's size is greater than w. It is a continuous LOD control parameter that provides smooth transitions on our data display.

3. Proximity-Based Coloring

In hierarchical approaches, we assign a color to each cluster. Our method maps colors by cluster proximity, hence the name proximity-based coloring. This proximity is based on the structure of the hierarchical tree, that is sibling nodes are considered closer than non-sibling nodes. We first impose a linear order on the data clusters gathered for display at a given LOD value, w. Then, we assign colors to each cluster by looking up a linear colormap table. Our approach colors clusters based on the cluster order derived during the tree traversal. The color ranges assigned  are nested just like clusters are nested, meaning larger clusters are assigned a broader range of color values and smaller clusters are assigned narrower ranges. Since small clusters imply that elements are closer to each other, they are assigned closer color values on the narrower color range. In addition, a "buffer" is introduced between subtrees. The buffer acts as an unused color interval between subtrees so that elements at the proximal ends of subtrees are not assigned colors that are indistinguishable. Clearly the buffer should be larger between large subtrees and smaller otherwise.

Proximity-based coloring highlights the relationships among clusters. It is however not always possible to impose a linear order on the data clusters. For instance, a cluster chain forming a circular loop is not amenable to any linear order. In this case, an arbitrary break must be made at some point in the loop. Data elements at the break point, though similar according to our proximity measure, may be assigned contrasting colors.

Navigation and Filtering Tools in Hierarchical approaches

In this section, we describe the set of manipulation and filtering tools that allow us to interactively modify the display in order to discover new or hidden relationships in the data set.

1. Structure-Based Brushing

Brushing, in the context of multivariate visualization, refers to an interactive process for localizing a subset of a data set
[martin:95,wong:96,wegman:97]. Many useful operations, such as highlighting, deleting, masking, or aggregation, may then be performed on elements that lie within the brushed subspace.

Brushing is a direct and data-driven metaphor. The operation may be performed in 2-D screen space, e.g., via methods such as rubber-banding rectangles or mouse lasso operations. Brushing may also be performed in N-D data space by interactive creation of N-D hyperboxes by painting over data points of interest.

We  develop structure-based brushing as a general mechanism for navigating in hierarchical space (see figure 1). Details of the structure-based brush can be found in [fua:99b].

The triangular frame depicts the hierarchical tree. The leaf contour depicts the silhouette of the hierarchical tree. It delineates the approximate shape formed by chaining the leaf nodes. The colored bold contour across the middle of the tree delineates the tree cut S(w) that represents the cluster partition corresponding to a level-of-detail w. The colors on the contour correspond to the colors used for drawing the nodes on the main parallel coordinates display. The two movable handles on the base of the triangle, together with the apex of the triangle, form a wedge in the hierarchical space.

The brushing interaction for the user consists of localizing a subspace within the hierarchical space by positioning the two handles at the base of the triangle. The embedded wedge forms a brushed subspace within the hierarchical space. Elements within the brushed subspace may be examined at different level-of-detail, or magnified and examined in full view, or masked or emphasized using fading in/out operations. The user may then specify the level or levels of interest by selecting a vertical value or range on the structure display.

The main advantage of structure-based brushing is derived from the color correspondence between the data display and the structure display. Sets of elements may be selected by positioning the wedge handles so as to bound the range of colors spanned by the elements. Moreover, similar elements are selected as a group, since by our coloring criteria, similar elements are drawn in similar colors.

Structure-based brushing is a general method of brushing in hierarchical space, hence it can be applied to various hierarchical multiresolutional visualization techniques. We compare and contrast it with the conventional way of brushing [fua:99b].

2. Drill-down and Roll-up

The two basic hierarchical operations when displaying data at multiple levels of aggregation are the ``drill-down'' and ``roll-up'' operations. Drill-down refers to the process of viewing data at a level of increased detail, while roll-up refers to the process of viewing data with decreasing detail.

Our system provides smooth and continuous level-of-detail control in all drilling operations. The control parameter is based on a measure of cluster size. The level-of-detail can be varied indirectly using a slider or directly by adjusting the colored contour across the cluster tree.

We couple our drilling operations with brushing. Our system permits selective drill-down and roll-up of the brushed and non-brushed region independently. This flexibility is important as it allows the viewing of a subset of elements in varying levels of detail in relation to elements outside the subset.

3. Extent Scaling

Where there are overlapping bands, it is often difficult to isolate or tell them apart. Our system overcomes this difficulty by allowing the thickness of bands to be scaled uniformly via a dynamically controlled scale factor. With this feature we can, for example, reveal the relative sizes of the extents while reducing occlusions.

4. Dynamic Masking

Another tool for managing the complexity of a dense display is a process we call dynamic masking.  This involves controlling the relative opacity between brushed and unbrushed areas. With dynamic masking, the viewer can interactively fade out the unbrushed nodes, thereby obtaining a clearer view of the brushed nodes. Conversely, the brushed nodes can be faded out, thus obtaining a clearer view of the unbrushed region. Hence, context is maintained while reducing clutter.

Navigation and Filtering Tools in Flat and Hierarchical approaches

1. Dimension Zooming

The use of distortion techniques [leung:94,rao:95] has become increasingly common as a means for visually exploring dense information displays.  Distortion operations allow the selective enlargement of subsets of the data display while maintaining context with surrounding data.  We introduce a distortion operation that we term dimension zooming.  We scale up each of the dimensions independently with respect to the extents of the brushed subspace, thus filling the display area. The subset of elements may then be examined as an independent data set. This zooming operation may be performed as many times as desired. For a data set occupying a large range of values, this operation is invaluable for examining localized trends.

2. Dimension reordering

The newest version of XmdvTool(4.1) gives users the freedom of  reordering the dimensions.


[BEC:88]:  Becker, R.A., Cleveland, W.S.. Brushing Scatterplots. Dynamic Graphics for Statistics, 1988.

[fua:99a] Ying-Huey Fua, Matthew O. Ward and Elke A. Rundensteiner. Hierarchical Parallel Coordinates for Exploration of Large Datasets. Visualization ¡®99, p. 43-50, October, 1999.

[fua:99b]:  Y. Fua, M. Ward, and E. Rundensteiner. Navigating hierarchies with structure-based brushes. Proc. of Information Visualization '99, Oct. 1999.

[jain:88]: K. Jain and C. Dubes. Algorithms for Clustering Data. Prentice Hall, 1988.

[leung:94]:  Y. Leung and M. Apperley.  A review and taxonomy of distortion-oriented presentation techniques.
ACM Transactions on Computer-Human Interaction Vol. 1(2), June 1994, p. 126-160, 1994.

[martin:95]: A. Martin and M. Ward.  High dimensional brushing for interactive exploration of multivariate
  data.  Proc. of Visualization '95, p. 271-8, 1995.

[rao:95]:  R. Rao and S. Card.  Exploring large tables with the table lens.  Proc. of ACM CHI'95 Conference on Human Factors in Computing Systems, Vol. 2, p. 403-4, 1995.

[SMI:90]:  Smith, S., Bergeron, R.D., Grinstein, G..  Stereophonic and surface sound generation for exploratory data analysis.  Proc. CHI '90: Human Factors in Computer Systems, pp. 125-132, 1990.

[WAR:94]:  M. Ward.  Xmdvtool: Integrating multiple methods for visualizing multivariate data.  Proc. of Visualization '94, p. 326-33, 1994.

[wegman:97]:  E. Wegman and Q. Luo.  High dimensional clustering using parallel coordinates and the grand
  tour.  Computing Science and Statistics, Vol. 28, p. 361-8., 1997.

[wong:96]: P. Wong and R. Bergeron.  Multiresolution multidimensional wavelet brushing.  Proc. of Visualization '96, p. 141-8, 1996.