Next: Structure-Driven Glyph Placement
Up: A Taxonomy of Glyph
Previous: Glyph Placement Strategies
In data-driven placement, the data are used to compute or specify the
location parameters for the glyph. The two categories of this strategy
class are raw and derived.
In raw data strategies, one, two or three of the data dimensions are used as
positional components
- Conveys detailed relationships between the dimensions selected,
- There are N(N-1) possible mappings
- An ineffective mapping can result in substantial
cluttering and poor screen utilization.
- Some mappings may be more meaningful to the person interpreting the
display than others.
- Bias is given to the dimensions involved in
the mapping, and thus conveys only pairwise (or three-way, for 3-D)
relations between the selected dimensions.
- Most useful when two or more
of the data dimensions are spatial in nature.
Figure 3:
All pairwise raw data-driven views (star glyphs) of the Iris data set:
(a) sepal length vs. sepal width,
(b) sepal length vs. petal length,
(c) sepal length vs. petal width,
(d) sepal width vs. petal length,
(e) sepal width vs. petal width, and
(f) petal length vs. petal width. Note that while each view separates one
iris family (sailboat shape) from the other two (kite shape), varying degrees
of separation can be seen within the large cluster. Also, some views reveal
a number of outliers.
|
A derived data placement technique uses an analytic process to generate
positions using the data values as input.
- Reflects some combination of all the dimensions in an attempt to
convey N-dimensional relational
information in a smaller number of display dimensions.
- Common dimensionality reduction techniques include
Principal Component Analysis (PCA) [25],
Multidimensional Scaling (MDS) [31,5], and
Self-Organizing Maps (SOMs) [30].
- PCA attempts to find linear
combinations of the dimensions which explain the largest variation in the
multivariate data set.
- SOMs and MDS are
iterative refinement/optimization processes which attempt to adjust weights
or positions until a certain criteria is met
- Resulting display coordinates have no semantic meaning.
- PCA assumes that the majority of the variation in a data set will be
well embodied in the first few principal components
- MDS and SOMs are not guaranteed to be optimal, and
the results are generally not unique.
Post-processing can involve distorting positions to reduce clutter and overlap.
- random jitter has been employed in statistical graphics when
data-driven positioning is being used
- Alternatively, shift positions to
minimize or avoid overlaps.
- Concern is the level of distortion that is introduced.
- Can selectively vary the level of detail shown in the visualization [34].
- need to provide users interactive control of the transformation to
facilitate maintenance of context.
Figure 4:
Star glyphs of Iris data set with
position based on the first two principal components. Reasonable separation
can be seen in the large cluster between larger and smaller 'kite' shapes.
|
Next: Structure-Driven Glyph Placement
Up: A Taxonomy of Glyph
Previous: Glyph Placement Strategies
Matthew Ward
1999-02-08