Next: Structure-Driven Glyph Placement Up: A Taxonomy of Glyph Previous: Glyph Placement Strategies

Data-Driven Glyph Placement

In data-driven placement, the data are used to compute or specify the location parameters for the glyph. The two categories of this strategy class are raw and derived.

In raw data strategies, one, two or three of the data dimensions are used as positional components

Conveys detailed relationships between the dimensions selected,
There are N(N-1) possible mappings
An ineffective mapping can result in substantial cluttering and poor screen utilization.
Some mappings may be more meaningful to the person interpreting the display than others.
Bias is given to the dimensions involved in the mapping, and thus conveys only pairwise (or three-way, for 3-D) relations between the selected dimensions.
Most useful when two or more of the data dimensions are spatial in nature.

**Figure 3:** All pairwise raw data-driven views (star glyphs) of the Iris data set: (a) sepal length vs. sepal width, (b) sepal length vs. petal length, (c) sepal length vs. petal width, (d) sepal width vs. petal length, (e) sepal width vs. petal width, and (f) petal length vs. petal width. Note that while each view separates one iris family (sailboat shape) from the other two (kite shape), varying degrees of separation can be seen within the large cluster. Also, some views reveal a number of outliers.
$\begin{figure} \hbox{ % \fbox{ \psfig{figure=iris12.ps,width=.28\textwidth}}\hs... ...idth}(d) \hspace{.30\textwidth}(e) \hspace{.30\textwidth}(f) \par\end{figure}$

A derived data placement technique uses an analytic process to generate positions using the data values as input.

Reflects some combination of all the dimensions in an attempt to convey N-dimensional relational information in a smaller number of display dimensions.
Common dimensionality reduction techniques include Principal Component Analysis (PCA) [25], Multidimensional Scaling (MDS) [31,5], and Self-Organizing Maps (SOMs) [30].
PCA attempts to find linear combinations of the dimensions which explain the largest variation in the multivariate data set.
SOMs and MDS are iterative refinement/optimization processes which attempt to adjust weights or positions until a certain criteria is met
Resulting display coordinates have no semantic meaning.
PCA assumes that the majority of the variation in a data set will be well embodied in the first few principal components
MDS and SOMs are not guaranteed to be optimal, and the results are generally not unique.

Post-processing can involve distorting positions to reduce clutter and overlap.

random jitter has been employed in statistical graphics when data-driven positioning is being used
Alternatively, shift positions to minimize or avoid overlaps.
Concern is the level of distortion that is introduced.
Can selectively vary the level of detail shown in the visualization [34].
need to provide users interactive control of the transformation to facilitate maintenance of context.

**Figure 4:** Star glyphs of Iris data set with position based on the first two principal components. Reasonable separation can be seen in the large cluster between larger and smaller 'kite' shapes.
$\begin{figure} \centerline{\psfig{figure=irispca.ps,width=3.25in}} \end{figure}$

Next: Structure-Driven Glyph Placement Up: A Taxonomy of Glyph Previous: Glyph Placement Strategies

Matthew Ward
1999-02-08