Dialog: Dimension Reduction

For more detailed information, please read our on-line documents: 

InterRing: An Interactive Tool for Visually Navigating and Manipulating Hierarchical Structures 

Visual Hierarchical Dimension Reduction for Exploration of High Dimensional Datasets

Introduction to the Visual Hierarchical Dimension Reduction (VHDR) approach

Visual Hierarchical Dimension Reduction (VHDR) aims to help users handle high dimensional data by generating meaningful lower dimensional spaces with user interactions. VHDR is composed of the following steps: 

Step 1: Dimension Hierarchy Generation

First, all the original dimensions of a multidimensional data set are organized into a hierarchical dimension cluster tree (the tree you see in the InterRing display) automatically according to similarities among the dimensions. Each original dimension is mapped to a leaf node in this tree. Similar dimensions are placed together and form a cluster, and similar clusters in turn compose higher-level clusters. This step has been finished when the Dimension Reduction dialog pops up.  

Step 2: Dimension Hierarchy Navigation and Modification

Next, users can navigate through the hierarchical dimension cluster tree in order to gain a better understanding of it. Users can also interactively modify the hierarchy structure. The hierarchical dimension cluster tree is visualized in the Dimension Reduction dialog. It is displayed using a radial space-filling technique named InterRing, which contains a suite of navigation and modification tools. In the following sections of this help file, the tools will be introduced.

Step 3: Dimension Cluster Selection

Next, users interactively select interesting dimension clusters from the hierarchy in order to construct a lower dimensional subspace. Several selection mechanisms are provided in InterRing to facilitate dimension cluster selection, which will be introduced in the later sections. Selected clusters are highlighted using the system highlighting color.

Step 4: Representative Dimension Generation

In this step, a representative dimension (RD) is automatically created for each selected dimension cluster. The selected dimension clusters construct the lower dimensional space through these RDs. RDs are selected to best reflect the aggregate characteristics of their associated clusters. In this release,  for a non-leaf node, the RD is the average of all the original dimensions in the cluster. For a leaf node, the RD is the dimension itself. Users can select one leaf node of a cluster instead of the cluster itself so that the generated lower dimensional space could be more meaningful. 

Step 5: Data Projection and Visualization

Finally, the data set is projected from the original high dimensional space to a lower dimensional space (LD space) composed of the RDs of the selected clusters after the user clicks the "Apply" button in the dialog. We call its projection in the LD space the mapped data set. The mapped data set is viewed as an ordinary data set in the LD space and  is visualized using the current multidimensional visualization technique in the main window. You can also change the current visualization technique freely as you can do for the original data set.  In order to provide further dimension cluster characteristics in the LD space, such as the dissimilarity information between dimensions within a cluster, we attach the dimension cluster characteristics information to the mapped data set and provide the option to display it. We call this dissimilarity visualization. We can perform dissimilarity visualization from two different viewpoints: from that of the individual data items, or from that of the whole data set. We name the former the ``local degree of dissimilarity (LDOD)" and the latter the ``global degree of dissimilarity (GDOD)". They are defined as follows:

LDOD - the degree of dissimilarity for a single data item in a dimension cluster. We use a mean, a maximum, and a minimum value to describe it. The mean is the mapped image of the data item on the representative dimension. The minimum and the maximum are the minimum and maximum values among the values of the data item on all the original dimensions belonging to the dimension cluster. Note that all the dimensions have been normalized so values lie between 0 and 1. 
GDOD - the degree of dissimilarity for the entire data set in a dimension cluster. It is a scalar value and can be calculated according to the similarity measures between each pair of the dimensions in the cluster. We use a simplified approach, namely, we use directly the radius of a dimension cluster as its GDOD. A dimension cluster radius is initially assigned as the similarity threshold of the iteration in which the dimension cluster is formed in the VHDR automatic dimension cluster approach.

In the Menu.Options.Dissimilarity Display section, we will introduce the dissimilarity displays in this release and introduce how to select them.  

Back to Top

Buttons

There are a group of check buttons in the dialog. Each of them represents a mode of the InterRing display. With a button checked, the InterRing display enters the mode the check button represents and allows certain interactions to happen in it. You can click to check a button, and click it again to uncheck it. 

Circular Distort: 

In this mode, users can increase the sweep angles of the interested nodes. You can perform the distortion in two modes: the non-pin mode and the pin mode (to switch between them, use Menu->Options->Circular Distort).

 

In the non-pin mode, directly drag and drop the edge of the interested nodes in the InterRing display to increase or decrease them using the left mouse button.

 

In the pin mode, firstly click your interested nodes using right mouse button to pin them, then drag and drop their edges to increase or decrease them using the left mouse button. When you pin a node, the previous pinned node will be automatically unpinned. 

 

Radial Distort: 

In this mode, users can increase the radius of the interested levels. 

 

Firstly, click your interested level using the right mouse button to pin it. Then drag and drop it edges to increase or decrease its radius using the left mouse button. When you pin a level, the previous pinned level will be automatically unpinned. You can also directly click or drag and drop the left mouse button without pinning any node to distort the nodes that are closest to the cursor.

Rotate: 

In this mode, users can rotate the InterRing display by clicking on InterRing display. Click with left mouse button to rotate anti-clockwise. Click with right mouse button to rotate clockwise.

Roll up/Drill down: 

In this mode, users can hide/show sub-branches of the tree in the InterRing display by clicking on the root node of the sub-branches. Click a node to hide the sub-branch rooted at it. Click it again to show the sub-branch expanded out. 

Modify: 

In this mode, users can change the tree structure. You can drag the root of a sub-branch and release it at a node that you want to be its new parent node using you left mouse button.

Select: 

In this mode, users can select nodes in the tree. These nodes are the dimension clusters that will construct the new lower dimensional space after you click the Apply button. 

 

To select a node, click it using the left mouse button. To unselect it, click it again. Or you can click the right mouse button on a node, than a "Select" dialog will pop up. Change the scaling bar in it and click OK to select multiple nodes in the sub-branch rooted at that node. 

 

Selected clusters are highlighted using the system "highlight-1" color (system colors are the colors in the "Color Requester" dialog; you can change colors in this dialog.).

Apply Button: 

If you have selected some nodes in the InterRing display, you will get you new lower dimensional space in the display of the main window. It is composed of the dimension clusters you selected in the InterRing display. 

Back to Top

Menu:

Options->Dissimilarity Display: 

Raise the Dissimilarity Display dialog to select different methods to convey dimension cluster characteristics when visualizing the data set in lower dimensional spaces.

 

 

Wide Axes

 

You can select the "Wide Axes" button if you are in the (flat or hierarchical) Parallel Coordinates or (flat or hierarchical) Scatterplot Matrices mode.

With this button selected, if you are in the Parallel Coordinates mode,  the axis width of a representative dimension is proportional to the GDOD of the dimension cluster it represents. A wider axis represents a dimension cluster with a larger GDOD. In the flat scatterplot matrices, GDOD is mapped to the width of the frames of the plots. 

 

Three Axes 

 

You can select the "Three Axes" button if you are in the flat Parallel Coordinates, flat Scatterplot Matrices, or flat star glyph mode to visualize LDOD.

 

In the flat Parallel Coordinates mode, two extra axes are displayed around a representative dimension to indicate the minimum and maximum of the corresponding dimension cluster for every data point. Good correlation within a cluster would manifest itself as nearly horizontal lines through the 3 axes, while lines with steep slope indicate areas of poor correlation.

In the flat Scatterplot Matrices mode, if you select the "Three Axes" button, diagonal plots will be used to represent LDOD. The minimum and maximum of the dimension cluster to the x and y coordinates of the diagonal plot of its representative dimension. Thus in the diagonal plots, if a point has an equal maximum and minimum, it will be represented as a point on the diagonal. On the contrary, if a point has a large LDOD, which means there is a large difference between maximum and minimum and thus a large difference between its x and y coordinates, it will lie a significant distance from the diagonal. Thus a plot along the diagonal of the matrix with points spread out in the plot away from the diagonal indicates low correlation within that dimension cluster. 

In the flat Star Glyph mode, the minimum and maximum of the dimension clusters are visualized using the system "Grid1" and "Grid2' color. The length of the line segment from the star center to the beginning of the "Grid1" color is proportional to the minimum value of the cluster. The length of the line segment from the star center to the end of the "Grid1" color is proportional to the mean value of the cluster. The length of the line segment from the star center to the end of the "Grid2" color is proportional to the maximum value of the cluster. 

Mean Band

 

You can select the "Mean Band" button if you are in the flat Parallel Coordinates mode. A band is added to each data point ranging in width from the minimum to the maximum for each representative dimension. You can reduce the overlaps using the "Band Extent Scaling" scroll bar in the Dimension Reduction dialog.

Options->Axis Color: 

You can select Uniform Color or Node Color for the lower dimensional space display. If you choose Uniform Color, then all the axes in the lower dimensional space display will be the system "Grid1" color. If you choose Node Color, the axes in the lower dimensional space display will be the same colors of their corresponding nodes in the dimension hierarchy shown in the InterRing display.

Options->Circular Distortion: 

To choose from the Non-Pin or Pin mode when perform the circular distortion. 

Options->Visual Feedbacks: 

To enable/disable circular distortion feedback, roll up/drill down feedback, and show/hide selected nodes' name. 

Options->Selection Mode: 

To choose from the selecting according to Entries mode or the according to Radius mode.. 

Options->SBBrushing Mode: 

To switch the structure-based brushing mode between "cover all leaves" and "leaves only and filter out similar leaves".

In the "cover all leaves" mode, for each leaf node in the sub-branch that has been applied structure-based brushing, either itself, or one of its ascendant will be selected. While in the other mode, only leaf nodes will be selected. If there are several leaves that are similar to each other according to the dissimilarity threshold, only one of them will be selected. Unimportant leaves will also be filtered out.    

Options->Reordering: 

To reorder the dimension hierarchy according to their importance... 

Options->Dimension Spacing: 

If dimension spacing is on, after the "apply" button is pressed, the dimensions in the Paralle Coordinates and Star Glyphs displays will be placed according to the similarities between adjacent dimensions. The more similar two adjacent dimensions are, the more close they are to each other. 

Reset ->Color Reset: 

To reassign  colors to all nodes of the dimension hierarchy according to the current tree structure. It can be used after you modify the hierarchy.

Reset->Circular Distortion Reset: 

To reassign sweeping angles to all nodes of the dimension hierarchy according to the current tree structure. 

Reset->Radial Distortion Reset: 

To reassign radius to all levels of the dimension hierarchy according to the current tree structure. 

Reset->Selection Reset: 

To unselect all the currently selected nodes.  

Reset->Apply Reset: 

To return to the original high dimensional space.  

Back to Top

Message Bar

Message bar gives you many useful "how to do" and "what's it" hints. 

Back to Top