JViz User's Manual Getting Up and Running SYSTEM REQUIREMENTS: Java 1.1.2 or higher required. Java 1.1.5 or higher recommended Java Foundation Classes (aka Swing) version 1.0 or higher This is a package from Sun Microsystems and can be downloaded from their web site at: http://www.javasoft.com/jfc INSTALLATION: Once the JViz files are unarchived, the included scripts require two environment varibles be set. JAVA_HOME - This environment variable should point to the top directory of the installed JDK or JRE. SWING_HOME - This environment variable should point to the top directory of the Swing package installation Overview JViz allows users to visually explore multivariate data in a variety of methods. The major concepts which users need to grasp to be able to effectively utilize JViz are the data projection techniques available and the use of the N-dimensional brush. These concepts are outlined below, after which a description of the JViz interface is presented. JViz display methods JViz supports three ways of visualizing data: scatter plots, glyphs, and parallel coordinates. Each of these displays can be selected by pressing the corresponding button on the JViz main display. Scatter plots Scatter plots are one of the oldest and most commonly used methods to project high-dimensional data to 2-dimensions. In this method, a grid of parallel projections of the data is generated. Each of these is a simple plot for the two dimensions it represents. The horizontal dimension of a scatter plot is controlled by the column it resides in, and likewise the vertical dimension is controlled by the row. At the top of each column is a numbered label and at the left of each row is the number that shows what dimensions that row / column represents. Glyphs The definition of a glyph covers a large number of techniques which map data values to various geometric and color attributes of graphical primitives or symbols. Histograms are perhaps the most widely recognized form of glyph, where data values control the height of rectangular bars. One might even consider a scatter plot a specialized form of glyph representation, where data values control the positional attributes of a symbol. In JViz, we use the star glyph pattern, which creates for each data point N rays emanating at equal angles from a point on the screen. Currently glyphs are spaced uniformly across the screen, though we might want to use 2 of the dimensions to control position. The length of each ray is determined by the data value for that dimension. A polyline is then generated to encompass the rays, forming a blob shape. The glyph brush tool may be used as a key to the dimensions. Parallel Coordinates In Parallel Coordinates, each dimension corresponds to an axis, and the N axes are organized as uniformly spaced vertical lines. A data element in N-dimensional space manifests itself as a connected set of points, one on each axis. Points lying on a common line or plane create readily perceived structures in the image. In generating the display of parallel coordinates in JViz, the view area is divided into N vertical slices of equal width. At the center of each slice an axis is drawn, along with a label at the top end. Data points are generated as polylines across the N axes. Brushing A big problem with projecting data of a given number of dimensions to a smaller number of dimensions is that you invariably lose information regarding the spatial relationships of the data. N-D brushing is a way of recovering some of this lost information by highlighting data points that fall into a user-defined subspace. Thus, one could say ``show me the points within a (Euclidean) distance of N from a certain location.'' A brush is completely defined by its shape, size, and location. We assume a hyper-parallelepiped (a fancy name for an N-D box), and the user specifies the size of the box in each dimension. Then a location in N-D space is specified as a center point for the brush. Any data point falling within the N-D brush can have an operator applied to it, such as highlighting or masking. Brush Manipulation Each of the display methods allows the user to directly manipulate the brush. A summary of manipulation techniques for each display method follows: Scatter plots -- The brush coverage on the scatter plot display is represented by a series of rectangular boxes, one for each scatter plot. The boxes represent the range of points that the brush covers for the two dimensions of a particular scatter plot. Each of these boxes may be manipulated using the mouse. Clicking the mouse button will reshape the brush to the coordinates of the point of the cursor. To recenter the brush hold down both the Control key and mouse button and move the whole brush. Glyphs -- The brush coverage is not displayed in the glyph view. The brush may be recentered by clicking on a glyph with the left mouse button. The brush will change location to be centered on the selected point. This is useful for finding glyphs that are similar in shape to an existing glyph. Parallel Coordinates -- The brush coverage on the parallel coordinate display is represented by a shaded region that spans all the dimensions. The shaded region along an axis represents the range of points that the brush covers along that axis. Brush manipulation in this display works similarly to the scatter plot display. To change the value of the brush in a single dimension, move the cursor to the axis and click the mouse button. If the value is above the center of the brush, then the maximum brush value will change. Otherwise, the minimum brush value will change. To reshape the brush to fit selected data, hold down both the Shift key and the mouse button while moving the cursor over the desired data. Moving the entire brush is the same in parallel coordinates as it is in scatter plots. Brush Manipulation Tools This is an interface to the Glyph Brush Tool. It is located in the 'Display' menu under the heading "Glyph Brush Tool". Glyph Brush Tool -- This tool represents an enlarged glyph. The currently active brush coverage is displayed as a shaded region on the glyph. The brush may be resized by clicking on the appropriate ray of the glyph. These manipulations affect a local copy of the brush only, in order to have these changes take effect on the true brush, you must click the Apply button. Painting Painting is a method of creating a brush from the presented data. This method is currently available on in parallel coordinates, but future versions of JViz plan to include painting with scatter plots. Painting is accomplished by holding down the Shift key on the keyboard and dragging the mouse over points of interest. As long as the Shift key is held down the mouse cursor will continue to behave as a virtual paintbrush. When the Shift key is released, a brush will be generated that contains all the points that were painted. Visualization Hints The following are hints that can be used to find patterns in data in each of the view methods. Scatter plots -- look for linear features, which indicate a pairwise relationship within the data. Clusters are a bit deceptive, as there may be quite a spread amongst the other dimensions. Glyphs -- look for similar shapes and simple transitions between shapes, e.g. a convexity that grows or shrinks. It is also fairly easy to spot anomalies, although if there are too many shapes the anomalous ones may not stand out. One trick for examining dense data with 2 spatial dimensions is to use square data sets (e.g. width = height). Because the grid of glyphs uses the square root of the data set size as the number of rows and columns, you in effect get 2 dimensions for free. Regions of similarity and boundaries between distinct regions then manifest themselves as textures. Parallel Coordinates -- relationships between adjacent dimensions manifest themselves as line segments with similar orientation. Adjacent dimensions with negative correlation create an `X' pattern between them. Input File Format The file format for the data sets read by JViz is in simple ASCII format. All dimensions are represented by floating point values; discrete integer values will be converted automatically, but nominal values must be mapped to floating point. Simply assigning an integer to each name in the dimension usually works fine. The data file is composed of four main sections, described in detail here: Section 1 -- The first line in the data set file specifies how many dimensions there are in the data and how many data points. These values are stored as ASCII strings and must be separated by whitespace. Section 2 -- The next section of the input file contains the labels for each of the dimensions. These are stored as ASCII strings and there must be one name per line. The number of lines in this section of the file must match with the number of dimensions specified on the first line. Section 3 -- The third section of the input file contains the minimum and maximum values for each dimension and the number of bins to use in dimensional stacking. These are written as floating point numbers in ASCII format on a single line. These are stored as min/max/bin triplets for each dimension as such: min1 max1 bin1 min2 max2 bin2 ... The number of min/max/bin triplets should match the number of dimensions specified on line 1. Section 4 -- The final section of the input file contains the actual data points. Each sample is stored as a set of ASCII format floating point numbers on a single line. The number of floating point values on a line must match the number of dimensions specified on line 1, and the number of samples should match the number also specified on line 1. See JViz/data for some examples of JViz data files.