One way to discriminate between such cases is through the use of color. It is easy to distinguish two data lines that have differing colors even when they intersect.
Ideally we wish to adopt a coloring scheme that assigns colors via a similarity measure. Data lines that are similar with respect to the measure should be in similar shades, whereas dissimilar data lines should be shown in contrasting shades.
Our method maps colors by cluster proximity, hence the name proximity-based coloring. We first impose a linear order on the data clusters gathered for display at a given LOD value, w. These clusters are simply the partition elements S(w) as described in the previous section. The linear order on S(w) is directly derived from the traversal algorithm described in the previous section. That is, the clusters are ordered in the sequence which the nodes are gathered during the in-order tree traversal.
We assign colors to each element by looking up a linear colormap table. Colors are assigned to clusters based on the following recursive formula:
C0 | = | 0.5 | |
Ci | = | (1) |
where is the normalized color value of node Ti, and C0is the color of the root. We currently use Ci as the hue component of an HSV colormap. K is the branching-factor of the cluster tree, li is the tree depth at node Ti, and is the sign function defined as:
= | (2) |
Equation (1) colors clusters based on the cluster order derived during tree traversal. The equation however does not differentiate between adjacent elements (with respect to the linear order) belonging to different subtrees. It is important to distinguish between such elements because such adjacent elements are deemed ``far'' according to our proximity measure.
We revise (1) by introducing a ``buffer'' between subtrees. The buffer acts as an unused color interval between subtrees so that elements at the proximal ends of subtrees are not assigned the same colors. Clearly the buffer should be larger between large subtrees and smaller otherwise.
Let b, where b<1, be the desired buffer interval. Let the revised definition be:
Ci | = | (3) |
Equation (3) achieves our desired
purpose. We typically choose b to be small with values around 10-1.
Proximity-based coloring highlights the relationships among clusters. Consider Figure 4 which shows the Iris dataset [9] on parallel coordinates without proximity coloring, and Figure 5 which shows the same dataset with proximity coloring. By comparing the two figures, it is clear that coloring aids immensely in discerning meaningful patterns. In this case, three distinct clusters are apparent, and they appear as concentrations of blue, green, and pink cluster trends.
It is however not always possible to impose a linear order on the data clusters. For instance, a cluster chain forming a circular loop is not amenable to any linear order. In this case, an arbitrary break must be made at some point in the loop. Data elements at the break point, though similar according to our proximity measure, may be assigned contrasting colors.