Week 3

I. Levels of Visualization

Visualization of a single neural network, be it as a static weighted graph, an animation, a report of its fitness score on various tests, or what have you, I will call network visualization.
Visualizing the "family tree" for a neural network evolved via the genetic algorithm I will call heredity visualization.
Finally, I will refer to the visualization of the evolution of an entire population of neural networks as evolutionary visualization.

II. Heredity Visualization

If we draw a standard 2-dimensional tree diagram, and we wish to display n generations of the family tree, we are bounded vertically by n, and horizontally by p^n-1, where p is the number of parents per child. Available space will therefore impose limitations upon the maximum values of p and n. Note that the number of networks in the total population pool is not relevant here.
Three possible views (could have buttons to switch between them):
- Selected network at the top, descendants branching out below.
- Selected network at the bottom, ancestors branching out above.
- Selected network at the center, ancestors and descendants above and below.
By double-clicking on a network, you can re-center the tree on that network.
Tree display should be zoomable.
With the combination of re-centering and zooming, the user should be able to compensate quite a bit for any scalability problems that might arise.

III. Evolutionary Visualization

The most basic feature would be a graph of each generation's maximum, minimum, and average fitness vs. time.

Should also plot generation diversity vs. time (See Computing Diversity below).
You should be able to look at each generation's fitness distribution.
It would also be helpful to have a graph of volatility (see Research section below), as well as volatility distribution for each generation.

IV. Computing Diversity

All neural networks in a population will have the same number of input and output nodes, since they are all being trained to solve the same problem. The three respects in which a pair of neural networks can differ, therefore, are # of hidden nodes, edge weights, and connectivity. The following procedures show how to combine these three factors in a natural and intuitive way to produce a single figure for the diversity of a population.
To calculate Difference(N₁, N₂) (the difference between two neural networks):
- If N₁ and N₂ have the same number of hidden nodes, we can give them the same connectivity by introducing "dummy" (0-weight) edges to make each network fully connected. Since the two networks now differ only in their edge weights, we can simply square the numerical differences between every pair of corresponding edges, and sum the squares to produce a result. If there are w edge weights, this approach is equivalent to taking the Euclidean distance between the two networks in w-dimensional space.
- If the number of hidden nodes is different, we can simply introduce an appropriate number of "dummy" nodes into the smaller of the two networks, where a dummy node is connected to the other nodes by 0-weight edges. Since 0-weight edges don't do anything, we are in no way changing the behavior of the network by introducing these extra nodes. We can then use the procedure for networks with the same number of hidden nodes.
To calculate Diversity(N₁,N₂, . . . N_p) (the diversity in a population of neural networks):
- Find which of the p networks has the largest number of hidden nodes. Let this number be denoted by h.
- Using the methods employed by Difference(N₁, N₂), transform all networks in the population into fully connected networks with h hidden nodes.
- The p neural networks can now be thought of as a collection of p points in w-dimensional space, where w is the number of nodes in each network (w=(h-1)²). To find the "center" of these points, we simply sum the vectors for each point (i.e. the vectors from the origin to that point), and divide by p. We can then compute the Euclidean distance between this center point and each of the other points, sum them up, and return the result.
- This is similar to the way moment of inertia is computed in physics.
The diversity figure, when computed this way, will tend to be larger for larger populations. It may also be useful to compute a normalized diversity figure, i.e. the diversity figure given above divided by the population size, so that comparisons may be made between population pools of different size.

V. Center Networks

Using the techniques described in the previous section, it is easy to introduce a function Center(N₁,N₂, . . . N_p) which returns the "center" or average network in a population, i.e. the network whose weights correspond to the point in w-dimensional space (w is the number of edges) which represents the average of the p points corresponding to the networks in the population pool. Note that this hypothetical network need not actually exist in the pool. This Center() function may be useful for a number of things.
- By calculating the Difference() between the center of one generation and the next, you can determine how much of a leap was made between generations.
- By examining the center networks of a series of generations, you may be able to get a general sense of the direction in which the evolution is headed.
- When looking at the family tree for a network, you could examine the center network for each generation in the tree, allowing you to more readily see how a specific network came about.
If the networks in each generation could be grouped into clusters of similar networks, and the center computed for each cluster, it may be possible to automatically produce the kind of ape-to-man diagrams that you see in biology textbooks.

VI. Research

Two types of breeding or recombination methods are used in Evolution Strategies. In sexual recombination, a new individual is created from a set of n parents, where n depends on the type of sexual recombination being used. The other type is panmictic recombination, where one parent is held fixed, and then, for each component of the new individual (i.e. each edge weight), a second parent is selected at random from the population pool. Panmictic recombination thus involves (potentially) a much branchier family tree, and would therefore be harder to visualize.
When using Evolution Strategies, a number of mutation variables are associated with each neural network (probably the standard deviations for the mutations of each edge weight). By summing up these standard deviations, you could obtain a volatility figure for a given network, which would indicate its potential for change. The average volatility for a generation could be computed, and, when graphed next to the diversity, would tell you a lot about the way the evolution was progressing. For instance, if volatility was low and diversity was decreasing, you might conclude that the population was "zeroing in" on a solution.
I do not think it is necessary to modify my Difference() function to incorporate differences in mutation parameters, since the mutation parameters affect changes in the network across generations, and the Difference() function is intended to operate on a pair of neural networks as they exist in a particular moment in time. It may be appropriate, however, to modify my network visualization to include some indication of the volatility of each edge weight.