nextupprevious
Next:InteractionUp:Visualizing ClustersPrevious:Multiresolutional Cluster Display

  
Proximity-Based Coloring

Monochromatic line drawings present an inherent difficulty in parallel coordinates. Figure 3 shows a simple case where two three-dimensional data points offer an ambiguous interpretation. This ambiguity arise whenever it is visually difficult to trace the topology of a data point as it traverses across the coordinate axes. This commonly occurs where data lines meet at axis lines.


Figure 3: Ambiguous case in monochromatic parallel coordinates. (A) Two data points plotted in parallel coordinates. (B) Differing interpretation of the two points shown on the X-Z plane. 

One way to discriminate between such cases is through the use of color. It is easy to distinguish two data lines that have differing colors even when they intersect.

Ideally we wish to adopt a coloring scheme that assigns colors via a similarity measure. Data lines that are similar with respect to the measure should be in similar shades, whereas dissimilar data lines should be shown in contrasting shades.

Our method maps colors by cluster proximity, hence the name proximity-based coloring. We first impose a linear order on the data clusters gathered for display at a given LOD value, w. These clusters are simply the partition elements S(w) as described in the previous section. The linear order on S(w) is directly derived from the traversal algorithm described in the previous section. That is, the clusters are ordered in the sequence which the nodes are gathered during the in-order tree traversal.

We assign colors to each element by looking up a linear colormap table. Colors are assigned to clusters based on the following recursive formula:

C0 = 0.5  
Ci = $\displaystyle C_{parent(i)} + \frac{ \pi(i) }{ K^{l_i+1} }$ (1)

where $C_i \in [0,1]$ is the normalized color value of node Ti, and C0is the color of the root. We currently use Ci as the hue component of an HSV colormap. K is the branching-factor of the cluster tree, li is the tree depth at node Ti, and $\pi(i)$ is the sign function defined as:

$\displaystyle \pi(i)$ = $\displaystyle \left\{ \matrix{ +1 \mbox{ ~~ if i is odd} \cr-1 \mbox{ ~~ if i is even} }\right.$ (2)

Equation (1) colors clusters based on the cluster order derived during tree traversal. The equation however does not differentiate between adjacent elements (with respect to the linear order) belonging to different subtrees. It is important to distinguish between such elements because such adjacent elements are deemed ``far'' according to our proximity measure.

We revise (1) by introducing a ``buffer'' between subtrees. The buffer acts as an unused color interval between subtrees so that elements at the proximal ends of subtrees are not assigned the same colors. Clearly the buffer should be larger between large subtrees and smaller otherwise.

Let b, where b<1, be the desired buffer interval. Let the revised definition be:

Ci = $\displaystyle C_{parent(i)} + \pi(i) \left( b^{l_i} + \frac{1}{ K^{l_i+1} } \right)$ (3)

Equation (3) achieves our desired purpose. We typically choose b to be small with values around 10-1.
 
 

Figure 4: This shows the Iris dataset without proximity coloring.

 

Figure 5: This shows the Iris dataset with proximity coloring.

Proximity-based coloring highlights the relationships among clusters. Consider Figure 4 which shows the Iris dataset [9] on parallel coordinates without proximity coloring, and Figure 5 which shows the same dataset with proximity coloring. By comparing the two figures, it is clear that coloring aids immensely in discerning meaningful patterns. In this case, three distinct clusters are apparent, and they appear as concentrations of blue, green, and pink cluster trends.

It is however not always possible to impose a linear order on the data clusters. For instance, a cluster chain forming a circular loop is not amenable to any linear order. In this case, an arbitrary break must be made at some point in the loop. Data elements at the break point, though similar according to our proximity measure, may be assigned contrasting colors.


nextupprevious
Next:InteractionUp:Visualizing ClustersPrevious:Multiresolutional Cluster Display