Term Explanation

Interface Components

Back To Top

Multidimensional (Multivariate) Visualization

Multidimensional (Multivariate) Data

Multidimensional (multivariate) data can be defined as a set of data items E, where the ith  data item ei consists of a vector with N variables, ( xi1, xi2, ..., x in ).  Each variable (dimension) may be independent of or interdependent with one or more of the other variable. Variables may be discrete or continuous in nature, or take on symbolic (nominal) values.  Variables also have a scale associated with them, where scales are defined according to the existence or lack of an ordering relationship, a distance (interval) metric, and an absolute zero (origin).

Flat Displays and Hierarchical Displays

XmdvTool allows users to examine multidimensional data. 

Four basic methods of visualizing N-D multidimensional data on a 2-D screen are provided: scatterplots, glyphs, parallel coordinates, and dimensional stacking. We refer to them as Flat Displays. A user can interactively switch between display techniques. A major tool in XmdvTool for providing insights into multidimensional spatial relations is the flat brush, which allows users to perform operations on the data points which fall within a user-specified N-D subspace of the total space defined by the data. 

To explore large scale data sets, interactive hierarchical displays are provided upon the four basic displays, which are referred to as Hierarchical Displays. They are hierarchical scatterplots, glyphs, parallel coordinates, and dimensional stacking. In the hierarchical displays, all the data items in a data set is constructed into a hierarchical cluster tree. Interactive navigation tools, such as structure-based brushing, drill-down/roll-up operations, extent scaling and dynamic masking allow users to visually explore the hierarchy with user desired detail and context.  

Back To Top

Flat Display Techniques

Parallel Coordinates

In Parallel Coordinates, each dimension corresponds to an axis, and the N axes are organized as uniformly spaced vertical lines.  A data item  in N-dimensional space manifests itself as a connected set of points, one on each axis.  Points lying on a common line or plane create readily perceived structures in the image.  In generating the display of parallel coordinates in XmdvTool, the view area is divided into N vertical slices of equal width.  At the center of each slice an axis is drawn, along with a label at the top end. Data points are generated as polylines across the N axes. 

Scatterplot Matrices

Scatterplot Matrices are one of the oldest and most commonly used methods to project high-dimensional data to 2-dimensions.  In this method, an  grid of parallel projections of the data are generated. Each of these is a simple   plot for the two dimensions it represents.  The horizontal dimension of a scatterplot is controlled by the column it resides in, and likewise the vertical dimension is controlled by the row.  At the top of each column and at the left of each row is a label that shows what dimensions that row / column represents.

Glyphs

The definition of a glyph covers a large number of techniques which map variables various geometric and color attributes of graphical primitives or symbols.  Histograms are perhaps the most widely recognized form of glyph, where data values control the height of rectangular bars.  One might even consider a scatterplot a specialized form of glyph representation, where data values control the positional attributes of a symbol.  In XmdvTool, we use the star glyph pattern, which creates for each data item N rays emanating at equal angles from a point on the screen.  Currently glyphs are spaced uniformly across the screen, though we might want to use 2 of the dimensions to control position.  The length of each ray is determined by the data value for that dimension.  A polyline is then generated to encompass the rays, forming a blob shape.  The flat glyphs brush toolbox may be used as a key to the dimensions.

Dimensional Stacking

Several recent techniques for multivariate display have emerged which involve projecting high-dimensional data by embedding dimensions within other dimensions.  One starts by discretizing the ranges of each dimension (assigning what we term a cardinality or number of buckets for a dimension), Each dimension is then assigned an orientation (in our case, this would be horizontal or vertical) and an ordering (dimensions are said to have unique ``speeds''). The dimensions with the 2 slowest speeds are used to divide a virtual screen into sections, with the cardinalities used to determine how many sections horizontally and vertically will be generated.  Each section is then used to define the virtual screen for the next 2 dimensions (slowest of the remaining dimensions), again using the cardinality to determine how to break up the virtual screen.  This is repeated until all dimensions have been embedded and the data point can be mapped to its screen location.  In a way, the process is similar to the manner in which the digits of an odometer move at different speeds.  XmdvTool requires three types of information to project data using dimensional stacking.  The first is the cardinality (number of buckets) for each dimension.  The range of values for each dimension is then decomposed into that many equal sized subranges.  The second type of information needed is the ordering for the dimensions, from outer-most (slowest) to inner-most (fastest).  Dimensions are assumed to alternate in orientation, and the order of the dimensions in the input file is assumed to be the order for the mapping process.  The last piece of information used is the minimum size for the plotted data item (the system will increase this value if the entire image can fit within the view area).  Each data point then maps into a unique bucket, which in turn maps to a unique location in the resulting image.  A key is provided in a separate window to help users understand the order of embedding, and grid lines of varying intensity provide assistance in interpreting transitions between buckets at different levels in the hierarchy.

Flat Pixel Oriented Visualization:

There are two aspects that distinguish pixel oriented visualization techniques from other techniques used in XmdvTool. First, the number of data items that can be visualized without overlap is much higher than in other approaches. In general, pixel oriented techniques use each pixel of the display to represent one data value. This means that the number of data values that can be visualized at one point of time is only limited by the number of pixels on the display. A second unique feature of the technique is that the generated visualizations can be query-dependent. Query-dependency means that not only the data items fulfilling the query are visualized, but also a number of data items that approximately fulfill the query.

Back To Top

Hierarchical Display Modes

Overview

Hierarchical display modes can be used to visualize large scale data sets in a multi-resolution way approach. Here is a simple description of this approach: 

First, construct a hierarchical cluster tree for a data set . Similar data items are grouped into clusters, similar clusters are grouped into larger clusters. (XmdvTool automatically can automatically do this for you. So you need not worry about it. Of course, you can provide your own hierarchical tree in XmdvTool *.cg format. To learn this format, please refer to Xmdvtool website at http://davis.wpi.edu/~xmdv). 

Second, visualize this hierarchy in the hierarchical display modes. To visualize a data cluster, a mean is used to indicate the average value of all the data items in the cluster, and a band is used to indicate the range of the cluster. You can visualize the data set in different level of details and highlight interesting clusters through structure-based brush.  Please refer to Structure-Based Brush for Hier Displays section in Brushes Menu help file to find out how to control the bands, change levels of details and interactively play with the hierarchical displays.

If you want to learn more about the hierarchical display, please refer to Xmdvtool website at http://davis.wpi.edu/~xmdv. In the documents section we have several published papers providing detailed information on the hierarchical displays and structure-based brush.

Hierarchical Parallel Coordinates

The  hierarchical parallel coordinates are derived from the flat parallel coordinates. In the hierarchical parallel coordinates, the clusters rather than individual data items are displayed. The mean of a cluster is mapped to a polyline traversing across all the axes, with a band around it depicting the extents of the cluster. The lower edge of the band intersects each axis at the minimum value of its respective cluster in that dimension. The upper edge of the band intersects each axis at the maximum value of its respective cluster in that dimension. You can imagine that  if each data item included in that cluster appeared on screen, they would all be inside the band. Here is an hierarchical parallel coordinates display: 

Hierarchical Scatterplot Matrices

The hierarchical scatterplot matrices are derived from the flat scatterplot matrices. In the hierarchical scatterplot matrix, the clusters are shown in the N*N plots, where N is the original number of dimensions. The mean of a cluster is projected to the N*N plots just like how an ordinary data item in the flat form scatterplot matrix. The extent of a cluster is also projected onto the N*N plots forming rectangles around the projected mean. The projections of the same cluster on different plots are colored in the same way, which helps users link clusters from one plot to another. 

 

Hierarchical Glyphs

The hierarchical glyphs are derived from the flat glyphs . In the hierarchical glyphs, each star glyph represents a cluster. The mean values of a cluster are mapped to the length of rays emanating from a central point in the star glyph.  The ends of these rays are linked to form the mean polygon. The band around the mean polygon has two edges; one is outside the mean polygon and another one is inside the mean polygon. The inside edge intersects each axis at the minimum value of its respective cluster in that dimension, while the outside edge would intersect each axis at the maximum value of its respective cluster in that dimension if we extended the axes. Obviously, if we were to draw a star glyph starting from this center point to present a data item included in that cluster, this star glyph would be inside the band of that cluster.

 

Hierarchical Dimensional Stacking

The hierarchical dimensional stacking is derived from the flat dimensional stacking. In the hierarchical dimensional stacking, the clusters replace the data items. The mean of a cluster will fall into a single small block similar to where an ordinary data item in the flat form dimensional stacking would be placed. The band of this cluster depicts the extent of the cluster. This time it is possible that some parts of the band are disjoint from others in display space due to the nature of the dimensional stacking technique. 

Hierarchical Pixel Oriented Visualization :

The hierarchical visualization in XmdvTool addresses the visualization of large data sets by presenting a multi-resolution view of the data. Creation of the hierarchy preserves all the information present in the data while having access to the raw data elements. All the displays in XmdvTool have a counterpart in the hierarchical world and display the aggregate information maintained at each node in the cluster tree. The set of associated hierarchy navigation tools allows the user to discover patterns in the data set.

In order for hierarchical pixel oriented displays to show the same semantics as the flat pixel oriented displays, it is important to contrast structure-based brushing with traditional data-based brushing. In a traditional user-driven brushing operation, to specify a region of interest in a multivariate data display, the user sets upper and lower bounds for each dimension. In data-driven brushing, the user paints over groupings of interesting data. Neither of these approaches is suitable for isolating data elements which are structurally related. Rather, their focus is on the values of the data. Clearly, structure-based brushing provides new, and potentially invaluable, functionality beyond data-based brushing.

Back To Top

Flat Display Brushing (flat brushing)

Flat Brush Options

Brush operations are the actions that can be performed on data selected by the brush.  XmdvTool supports four types of brush operations:  highlight, mask, values, and average.  Each of these are explained in detail below.  In order to specify a brush operation the Operation Toolbox  must be used.

Highlight  -- This operation causes all the points covered by the brush to be displayed in a highlighting color. The highlighting color can be changed from  the Color Requester dialog.

Mask  -- This operation causes all points covered by the brush to be hidden from the display.  This is useful when the display is cluttered and it is necessary to remove some data to view other interesting data.

Values  -- This operation causes the numeric value of all points covered by the brush to be displayed in a separate popup window.  This window can be opened from the main interface by selecting Data Values.

Average  -- This operations causes the average value of all points currently selected by the brush to be displayed in each of the display views. In addition, if the Values  operation is also selected, the numeric average is added to the end of the values in the data values window.

Brush Expression:
The brush expression is a logical expression between brushes.  Points are covered by the operation if they evaluate to TRUE for this expression.

Flat Brushing in N-D Spaces

A big problem with projecting data of a given number of dimensions to a smaller number of dimensions is that you invariably lose information regarding the spatial relationships of the data.  N-D brushing (N is the number of dimensions in the dataset) is a way of recovering some of this lost information by highlighting data points which fall into a user-defined subspace.  Thus, one could say ``show me the points within a (Euclidean) distance of N from a certain location.''  A brush is completely defined by its shape, size, and location.  We assume a hyper-parallelepiped (a fancy name for an N-D box), and the user specifies the size of the box in each dimension.  Then a location in N-D space is specified as a center point for the brush.  Any data point falling within the N-D brush can have an operator applied to it, such as highlighting or masking.

Each of the display methods allows the user to directly manipulate the brush.  A summary of manipulation techniques for each display method follows:

Scatterplots  -- The brush coverage on the scatterplot display is represented by a series of rectangular boxes, one for each scatterplot.  The boxes represent the range of points that the brush covers for the two dimensions of a particular scatterplot.  Each of these boxes may be manipulated using the mouse.  The left mouse button can be used to resize the brush -- dragging either a corner or edge of one of the rectangular boxes will alter the brush shape.  The middle mouse button may be used to recenter the brush -- dragging the box with the middle mouse button pressed will move the brush to the cursor position.

Glyphs  -- The brush coverage is not displayed in the glyph view. The brush may be recentered by clicking on a glyph with the left mouse button.  The brush will change location to be centered around the selected point.  This is useful for finding glyphs that are similar in shape to an existing glyph.

Parallel Coordinates  -- The brush coverage on the parallel coordinate display is represented by a shaded region that spans all the dimensions.  The shaded region along an axis represents the range of points that the brush covers along that axis.  Brush manipulation in this display works similarly to the scatterplot display.  Dragging the brush with the left mouse button will resize the brush while dragging with the middle mouse button will recenter the brush.

Dimensional Stacking  -- Brush coverage in this display is shown by shading all the bins that are contained by the brush.  Brush manipulation in this display works similarly to the glyph display. Clicking on a point will recenter the brush to that point.

Types of Flat Brush

In XmdvTool flat brushes can have either a step edge or ramp edgeStep edge brushes allow points to either be completely inside or outside them.  When a highlight operation is performed on step edge brushes, points that are completely contained by the brush are highlighted, and points not contained are painted in the normal data color.

Ramped brushes do not have a discrete boundary.  Instead, along each dimension there is an inner and outer brush boundary.  The amount of coverage along one dimension is 1.0 inside the inner boundary and falls linearly to 0.0 at the outer boundary.  When a highlight operation is performed on ramped edge brushes, points that are completely contained by the brush are drawn in the normal highlight color for that brush.  Points that are partially contained by the brush are drawn in a color that is lighter as the coverage decreases. The amount of ramp of the brush is drawn as a thin line in the same color as the brush coverage on the parallel coordinate and scatterplot displays.  On each of these displays, the ramp boundary may be manipulated by holding down the Control key on the keyboard and dragging with the left mouse button.

Painting

Painting  is a method of creating a brush from the presented data. In order to use Painting the current brush must be both enabled and have its display attribute turned on (see Brush Toolbox). In the parallel coordinate and scatterplot displays, painting is accomplished by holding down  the Shift key on the keyboard and dragging the mouse over points of interest.  As long as the Shift key is held down the mouse cursor will  continue to behave as a virtual paintbrush.  When the Shift key is released, a brush will be generated that contains exactly all the points that were painted.

Back To Top

Structure-Based Brushing

This is a new variant of brushing that we have developed as a general mechanism for navigating in hierarchical space. We augment each node in the hierarchy, that is each cluster, with a monotonic value relative to its parent. This value can be, for example, the level number, the cluster size/population, or the volume of the cluster's extents. This assigned value determines the control for the level-of-detail. By choosing a continuous control variable such as the cluster size, the traversal of the tree through different levels of detail can be smooth transitions instead of abrupt screen changes.

(a) Hierarchical tree frame (b) Contour corresponding to current level-of-detail (in brushed region, it is referred to as brushed cluster radius; in non-brushed region, it is referred to as non-brushed cluster radius) (c) Leaf contour approximates shape of hierarchical tree (d) Structure-based brush (e) Interactive brush handles (left handle and right handle) (f) Colormap legend for level-of-detail contour

The triangular frame depicts the hierarchical tree. The leaf contour depicts the silhouette of the hierarchical tree. It delineates the approximate shape formed by chaining the leaf nodes. The colored bold contour across the middle of the tree delineates the tree cut that represents the cluster partition corresponding to a level-of-detail, for instance the cluster radius. The  colors on the contour corresponds to the colors used for drawing the nodes on the main parallel coordinates display. The two movable handles on the base of the triangle, together with the apex of the triangle, form a wedge in the hierarchical space.

The brushing interaction for the user consists of localizing a subspace within the hierarchical space by positioning the two  handles at the base of the triangle. The embedded wedge forms a brushed subspace within the hierarchical space. Elements  within the brushed subspace may be examined at different level-of-detail.

Back To Top